Re: memory efficient hash table extension? like lchash ...

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



PHP does expose sys V shared-memory apis (shm_* functions):

http://us2.php.net/manual/en/book.sem.php

If you already have apc installed, you could also try:

http://us2.php.net/manual/en/book.apc.php

APC also allows you to store user specific data too (it will be in a
shared memory).

Haven't tried these myself, so I would do some quick tests to ensure
if they meet your performance requirements. In theory, it should be
faster than berkeley-db like solutions (which is also another option
but it seems something similar like MongoDB was not good enough?).

I  am curious to know if someone here has run these tests. Note that
with memcached installed locally (on the same box running php), it can
be surprisingly efficient - using pconnect(),  caching the handler in
a static var for a given request cycle etc...

Ravi






On Sun, Jan 24, 2010 at 9:39 AM, D. Dante Lorenso <dante@xxxxxxxxxxx> wrote:
> shiplu wrote:
>>
>> On Sun, Jan 24, 2010 at 3:11 AM, D. Dante Lorenso <dante@xxxxxxxxxxx>
>> wrote:
>>>
>>> All,
>>>
>>> I'm loading millions of records into a backend PHP cli script that I
>>> need to build a hash index from to optimize key lookups for data that
>>> I'm importing into a MySQL database.  The problem is that storing this
>>> data in a PHP array is not very memory efficient and my millions of
>>> records are consuming about 4-6 GB of ram.
>>>
>>
>> What are you storing? An array of row objects??
>> In that case storing only the row id is will reduce the memory.
>
> I am querying a MySQL database which contains 40 million records and mapping
> string columns to numeric ids.  You might consider it normalizing the data.
>
> Then, I am importing a new 40 million records and comparing the new values
> to the old values.  Where the value matches, I update records, but where
> they do not match, I insert new records, and finally I go back and delete
> old records.  So, the net result is that I have a database with 40 million
> records that I need to "sync" on a daily basis.
>
>> If you are loading full row objects, it will take a lot of memory.
>> But if you just load the row id values, it will significantly decrease
>> the memory amount.
>
> For what I am trying to do, I just need to map a string value (32 bytes) to
> a bigint value (8 bytes) in a fast-lookup hash.
>
>> Besides, You can load row ids in a chunk by chunk basis. if you have
>> 10 millions of rows to process. load 10000 rows as a chunk. process
>> them then load the next chunk.  This will significantly reduce memory
>> usage.
>
> When importing the fresh 40 million records, I need to compare each record
> with 4 different indexes that will map the record to existing other records,
> or into a "group_id" that the record also belongs to.  My current solution
> uses a trigger in MySQL that will do the lookups inside MySQL, but this is
> extremely slow.  Pre-loading the mysql indexes into PHP ram and processing
> that was is thousands of times faster.
>
> I just need an efficient way to hold my hash tables in PHP ram.  PHP arrays
> are very fast, but like my original post says, they consume way too much
> ram.
>
>> A good algorithm can solve your problem anytime. ;-)
>
> It takes about 5-10 minutes to build my hash indexes in PHP ram currently
> which makes up for the 10,000 x speedup on key lookups that I get later on.
>  I just want to not use the whole 6 GB of ram to do this.   I need an
> efficient hashing API that supports something like:
>
>        $value = (int) fasthash_get((string) $key);
>        $exists = (bool) fasthash_exists((string) $key);
>        fasthash_set((string) $key, (int) $value);
>
> Or ... it feels like a "memcached" api but where the data is stored locally
> instead of accessed via a network.  So this is how my search led me to what
> appears to be a dead "lchash" extension.
>
> -- Dante
>
> ----------
> D. Dante Lorenso
> dante@xxxxxxxxxxx
> 972-333-4139
>
> --
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux