yeah i think that would make sense. if you find more good examples from different areas let me know ... i think i get the basic idea ... will try to apply it some time :) cheers :) art On 30 October 2010 18:58, Andrés G. Montañez <andresmontanez@xxxxxxxxx> wrote: > Hi Artur, > in your IPs examples, lets supouse you have ten access log files (from > ten different servers), > there you already have the mapping part done. > > Then you reduce each log into anonther new file, indicating the IP > address and the times it's repeated. > At this stage you have a reduced version of each log file; then you > need to map them into a new unique file, > this file will be the merge of all the reduced versions of the log files. > This this unique file, you will need to reduce it again, and there you > will have an unique file with all the > IPs address and the times they appear. > > There is no limit on the times you can call map and reduce. > > Cheers. > > On 30 October 2010 15:51, Artur Ejsmont <ejsmont.artur@xxxxxxxxx> wrote: >> sure that was a bit more helpful, thanks :) >> >> i was still wondering to what other use cases would that apply. This >> is a good article (best so far i guess): >> http://code.google.com/edu/parallel/mapreduce-tutorial.html >> >> The thing is that reduce has to aggregate data or it would be >> impractical. So i am trying to see more examples to fully understand >> the limitations of the method. >> >> Lets say i want to find top 10 IP addresses in an access log: >> - split log into small files >> - i take one fragment (one file) >> - worker maps to a list of <ip, 1> >> - before reduce is called data is sorted by ip >> - reduce makes <ip, totalCountPerLogFileSample> >> >> so i have a bunch of files with aggregated lists of <IP, >> totalCountPerFile>. But then would it not have to be merged across all >> results again? with another sort/reduce call? or to avoid that do i >> need initial data to be already clustered so one ip appears only in >> one chunk file? >> >> Does it make sense? >> >> As i said i am still trying to figure out how should it be applied and >> when ... also how to transform problems to make it still work : ) >> >> I want to write some simple map reduce like the one above just to see >> it working and play around a bit :) >> >> cheers >> >> Art >> >> On 22 October 2010 16:49, Andrés G. Montañez <andresmontanez@xxxxxxxxx> wrote: >>> Imagine you have to get track of some kind of traffic, for example, >>> "ad impressions"; >>> lets supose that you have millions of those hits; you will have to >>> have a few servers to >>> receive the notifications of the impression of an ad. >>> >>> After the end of the day, you will have that info across a bunch of >>> servers; mostly you will have >>> a record of each impression indicating the Identifier (id) of the Ad. >>> >>> To this info to become useful, you will have to agregate it; for >>> example to know which is the Ad with most impressions. >>> You will have to iterate over all servers and MAP the info into one >>> place; now that you have all the info, >>> you will have to REDUCE it; so you will have one record per Ad >>> identifier indicating the TOTAL impressions of that day. >>> >>> That's the basic idea. It's like aftermath of "Divide and Conquer". >>> >>> Hope this will be useful. >>> >>> Cheers. >>> >>> On 22 October 2010 13:27, Artur Ejsmont <ejsmont.artur@xxxxxxxxx> wrote: >>>> hehe .... sorry but this does not help :-) i can google for wikipedia >>>> definitions. >>>> >>>> I was hoping for some really good articles/examples that would put it >>>> into enough context. I would like to have good idea when it could be >>>> useful. >>>> >>>> So far had no luck with that. Its like with design patterns ... people >>>> who dont understand them should not write articles trying to explain >>>> them to others :P >>>> >>>> Art >>>> >>>> On 22 October 2010 15:29, Andrés G. Montañez <andresmontanez@xxxxxxxxx> wrote: >>>>> Hi Artur, >>>>> >>>>> Here is an article on wikipedia: http://en.wikipedia.org/wiki/MapReduce >>>>> >>>>> And here are the native implementations in php: >>>>> http://www.php.net/manual/en/function.array-map.php >>>>> http://www.php.net/manual/en/function.array-reduce.php >>>>> >>>>> The basic idea is to gather a lot of data, from several nodes, and >>>>> "map" them togheter; >>>>> then, assuming a lot of this data is repeated across the dataset, we >>>>> "reduce" them. >>>>> >>>>> >>>>> Cheers. >>>>> >>>>> On 22 October 2010 12:14, Artur Ejsmont <ejsmont.artur@xxxxxxxxx> wrote: >>>>>> Hi there guys and girls >>>>>> >>>>>> Have anyone came across any reasonable explanation / articles on how >>>>>> hadoop and map reduce work in practice? >>>>>> >>>>>> i have read a few articles now and then and i must say i am puzzled >>>>>> .... am i stupid or they just cant find an easy way to explain it? :P >>>>>> >>>>>> What i would hope for is explanation on simple example of application >>>>>> with some code samples preferably. >>>>>> >>>>>> anyone good at it here? >>>>>> >>>>>> cheers >>>>>> >>>>>> -- >>>>>> PHP Database Mailing List (http://www.php.net/) >>>>>> To unsubscribe, visit: http://www.php.net/unsub.php >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Andrés G. Montañez >>>>> Zend Certified Engineer >>>>> Montevideo - Uruguay >>>>> >>>> >>>> >>>> >>>> -- >>>> Visit me at: >>>> http://artur.ejsmont.org/blog/ >>>> >>> >>> >>> >>> -- >>> Andrés G. Montañez >>> Zend Certified Engineer >>> Montevideo - Uruguay >>> >> >> >> >> -- >> Visit me at: >> http://artur.ejsmont.org/blog/ >> > > > > -- > Andrés G. Montañez > Zend Certified Engineer > Montevideo - Uruguay > -- Visit me at: http://artur.ejsmont.org/blog/ -- PHP Database Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php