sure that was a bit more helpful, thanks :) i was still wondering to what other use cases would that apply. This is a good article (best so far i guess): http://code.google.com/edu/parallel/mapreduce-tutorial.html The thing is that reduce has to aggregate data or it would be impractical. So i am trying to see more examples to fully understand the limitations of the method. Lets say i want to find top 10 IP addresses in an access log: - split log into small files - i take one fragment (one file) - worker maps to a list of <ip, 1> - before reduce is called data is sorted by ip - reduce makes <ip, totalCountPerLogFileSample> so i have a bunch of files with aggregated lists of <IP, totalCountPerFile>. But then would it not have to be merged across all results again? with another sort/reduce call? or to avoid that do i need initial data to be already clustered so one ip appears only in one chunk file? Does it make sense? As i said i am still trying to figure out how should it be applied and when ... also how to transform problems to make it still work : ) I want to write some simple map reduce like the one above just to see it working and play around a bit :) cheers Art On 22 October 2010 16:49, Andrés G. Montañez <andresmontanez@xxxxxxxxx> wrote: > Imagine you have to get track of some kind of traffic, for example, > "ad impressions"; > lets supose that you have millions of those hits; you will have to > have a few servers to > receive the notifications of the impression of an ad. > > After the end of the day, you will have that info across a bunch of > servers; mostly you will have > a record of each impression indicating the Identifier (id) of the Ad. > > To this info to become useful, you will have to agregate it; for > example to know which is the Ad with most impressions. > You will have to iterate over all servers and MAP the info into one > place; now that you have all the info, > you will have to REDUCE it; so you will have one record per Ad > identifier indicating the TOTAL impressions of that day. > > That's the basic idea. It's like aftermath of "Divide and Conquer". > > Hope this will be useful. > > Cheers. > > On 22 October 2010 13:27, Artur Ejsmont <ejsmont.artur@xxxxxxxxx> wrote: >> hehe .... sorry but this does not help :-) i can google for wikipedia >> definitions. >> >> I was hoping for some really good articles/examples that would put it >> into enough context. I would like to have good idea when it could be >> useful. >> >> So far had no luck with that. Its like with design patterns ... people >> who dont understand them should not write articles trying to explain >> them to others :P >> >> Art >> >> On 22 October 2010 15:29, Andrés G. Montañez <andresmontanez@xxxxxxxxx> wrote: >>> Hi Artur, >>> >>> Here is an article on wikipedia: http://en.wikipedia.org/wiki/MapReduce >>> >>> And here are the native implementations in php: >>> http://www.php.net/manual/en/function.array-map.php >>> http://www.php.net/manual/en/function.array-reduce.php >>> >>> The basic idea is to gather a lot of data, from several nodes, and >>> "map" them togheter; >>> then, assuming a lot of this data is repeated across the dataset, we >>> "reduce" them. >>> >>> >>> Cheers. >>> >>> On 22 October 2010 12:14, Artur Ejsmont <ejsmont.artur@xxxxxxxxx> wrote: >>>> Hi there guys and girls >>>> >>>> Have anyone came across any reasonable explanation / articles on how >>>> hadoop and map reduce work in practice? >>>> >>>> i have read a few articles now and then and i must say i am puzzled >>>> .... am i stupid or they just cant find an easy way to explain it? :P >>>> >>>> What i would hope for is explanation on simple example of application >>>> with some code samples preferably. >>>> >>>> anyone good at it here? >>>> >>>> cheers >>>> >>>> -- >>>> PHP Database Mailing List (http://www.php.net/) >>>> To unsubscribe, visit: http://www.php.net/unsub.php >>>> >>>> >>> >>> >>> >>> -- >>> Andrés G. Montañez >>> Zend Certified Engineer >>> Montevideo - Uruguay >>> >> >> >> >> -- >> Visit me at: >> http://artur.ejsmont.org/blog/ >> > > > > -- > Andrés G. Montañez > Zend Certified Engineer > Montevideo - Uruguay > -- Visit me at: http://artur.ejsmont.org/blog/ -- PHP Database Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php