Re: big table / hadoop / map reduce

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



sure that was a bit more helpful, thanks :)

i was still wondering to what other use cases would that apply. This
is a good article (best so far i guess):
http://code.google.com/edu/parallel/mapreduce-tutorial.html

The thing is that reduce has to aggregate data or it would be
impractical. So i am trying to see more examples to fully understand
the limitations of the method.

Lets say i want to find top 10 IP addresses in an access log:
- split log into small files
- i take one fragment (one file)
- worker maps to a list of <ip, 1>
- before reduce is called data is sorted by ip
- reduce makes <ip, totalCountPerLogFileSample>

so i have a bunch of files with aggregated lists of <IP,
totalCountPerFile>. But then would it not have to be merged across all
results again? with another sort/reduce call? or to avoid that do i
need initial data to be already clustered so one ip appears only in
one chunk file?

Does it make sense?

As i said i am still trying to figure out how should it be applied and
when ... also how to transform problems to make it still work : )

I want to write some simple map reduce like the one above just to see
it working and play around a bit :)

cheers

Art

On 22 October 2010 16:49, Andrés G. Montañez <andresmontanez@xxxxxxxxx> wrote:
> Imagine you have to get track of some kind of traffic, for example,
> "ad impressions";
> lets supose that you have millions of those hits; you will have to
> have a few servers to
> receive the notifications of the impression of an ad.
>
> After the end of the day, you will have that info across a bunch of
> servers; mostly you will have
> a record of each impression indicating the Identifier (id) of the Ad.
>
> To this info to become useful, you will have to agregate it; for
> example to know which is the Ad with most impressions.
> You will have to iterate over all servers and MAP the info into one
> place; now that you have all the info,
> you will have to REDUCE it; so you will have one record per Ad
> identifier indicating the TOTAL impressions of that day.
>
> That's the basic idea. It's like aftermath of "Divide and Conquer".
>
> Hope this will be useful.
>
> Cheers.
>
> On 22 October 2010 13:27, Artur Ejsmont <ejsmont.artur@xxxxxxxxx> wrote:
>> hehe .... sorry but this does not help :-) i can google for wikipedia
>> definitions.
>>
>> I was hoping for some really good articles/examples that would put it
>> into enough context. I would like to have good idea when it could be
>> useful.
>>
>> So far had no luck with that. Its like with design patterns ... people
>> who dont understand them should not write articles trying to explain
>> them to others :P
>>
>> Art
>>
>> On 22 October 2010 15:29, Andrés G. Montañez <andresmontanez@xxxxxxxxx> wrote:
>>> Hi Artur,
>>>
>>> Here is an article on wikipedia: http://en.wikipedia.org/wiki/MapReduce
>>>
>>> And here are the native implementations in php:
>>> http://www.php.net/manual/en/function.array-map.php
>>> http://www.php.net/manual/en/function.array-reduce.php
>>>
>>> The basic idea is to gather a lot of data, from several nodes, and
>>> "map" them togheter;
>>> then, assuming a lot of this data is repeated across the dataset, we
>>> "reduce" them.
>>>
>>>
>>> Cheers.
>>>
>>> On 22 October 2010 12:14, Artur Ejsmont <ejsmont.artur@xxxxxxxxx> wrote:
>>>> Hi there guys and girls
>>>>
>>>> Have anyone came across any reasonable explanation / articles on how
>>>> hadoop and map reduce work in practice?
>>>>
>>>> i have read a few articles now and then and i must say i am puzzled
>>>> .... am i stupid or they just cant find an easy way to explain it? :P
>>>>
>>>> What i would hope for is explanation on simple example of application
>>>> with some code samples preferably.
>>>>
>>>> anyone good at it here?
>>>>
>>>> cheers
>>>>
>>>> --
>>>> PHP Database Mailing List (http://www.php.net/)
>>>> To unsubscribe, visit: http://www.php.net/unsub.php
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Andrés G. Montañez
>>> Zend Certified Engineer
>>> Montevideo - Uruguay
>>>
>>
>>
>>
>> --
>> Visit me at:
>> http://artur.ejsmont.org/blog/
>>
>
>
>
> --
> Andrés G. Montañez
> Zend Certified Engineer
> Montevideo - Uruguay
>



-- 
Visit me at:
http://artur.ejsmont.org/blog/

-- 
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php




[Index of Archives]     [PHP Home]     [PHP Users]     [Postgresql Discussion]     [Kernel Newbies]     [Postgresql]     [Yosemite News]

  Powered by Linux