MapReduce implementations of graph algorithms like PageRank and adsorption scale to millions of nodes on a cluster of around 50 machines, but if you want to process billions (or even tens of millions, depending on your algorithm) then you need a different framework. Google uses Pregel, about which they've said little except that it was inspired by the Bulk Synchronous Parallel model for parallel programming.
So the announcement of a BSP package for Hadoop in the Apache HAMA project could be an interesting one to watch. There's even a BSP hello world, although getting further may be hard work with the current level of documentation.
No comments:
Post a Comment