PHP does expose sys V shared-memory apis (shm_* functions): http://us2.php.net/manual/en/book.sem.php If you already have apc installed, you could also try: http://us2.php.net/manual/en/book.apc.php APC also allows you to store user specific data too (it will be in a shared memory). Haven't tried these myself, so I would do some quick tests to ensure if they meet your performance requirements. In theory, it should be faster than berkeley-db like solutions (which is also another option but it seems something similar like MongoDB was not good enough?). I am curious to know if someone here has run these tests. Note that with memcached installed locally (on the same box running php), it can be surprisingly efficient - using pconnect(), caching the handler in a static var for a given request cycle etc... Ravi On Sun, Jan 24, 2010 at 9:39 AM, D. Dante Lorenso <dante@xxxxxxxxxxx> wrote: > shiplu wrote: >> >> On Sun, Jan 24, 2010 at 3:11 AM, D. Dante Lorenso <dante@xxxxxxxxxxx> >> wrote: >>> >>> All, >>> >>> I'm loading millions of records into a backend PHP cli script that I >>> need to build a hash index from to optimize key lookups for data that >>> I'm importing into a MySQL database. The problem is that storing this >>> data in a PHP array is not very memory efficient and my millions of >>> records are consuming about 4-6 GB of ram. >>> >> >> What are you storing? An array of row objects?? >> In that case storing only the row id is will reduce the memory. > > I am querying a MySQL database which contains 40 million records and mapping > string columns to numeric ids. You might consider it normalizing the data. > > Then, I am importing a new 40 million records and comparing the new values > to the old values. Where the value matches, I update records, but where > they do not match, I insert new records, and finally I go back and delete > old records. So, the net result is that I have a database with 40 million > records that I need to "sync" on a daily basis. > >> If you are loading full row objects, it will take a lot of memory. >> But if you just load the row id values, it will significantly decrease >> the memory amount. > > For what I am trying to do, I just need to map a string value (32 bytes) to > a bigint value (8 bytes) in a fast-lookup hash. > >> Besides, You can load row ids in a chunk by chunk basis. if you have >> 10 millions of rows to process. load 10000 rows as a chunk. process >> them then load the next chunk. This will significantly reduce memory >> usage. > > When importing the fresh 40 million records, I need to compare each record > with 4 different indexes that will map the record to existing other records, > or into a "group_id" that the record also belongs to. My current solution > uses a trigger in MySQL that will do the lookups inside MySQL, but this is > extremely slow. Pre-loading the mysql indexes into PHP ram and processing > that was is thousands of times faster. > > I just need an efficient way to hold my hash tables in PHP ram. PHP arrays > are very fast, but like my original post says, they consume way too much > ram. > >> A good algorithm can solve your problem anytime. ;-) > > It takes about 5-10 minutes to build my hash indexes in PHP ram currently > which makes up for the 10,000 x speedup on key lookups that I get later on. > I just want to not use the whole 6 GB of ram to do this. I need an > efficient hashing API that supports something like: > > $value = (int) fasthash_get((string) $key); > $exists = (bool) fasthash_exists((string) $key); > fasthash_set((string) $key, (int) $value); > > Or ... it feels like a "memcached" api but where the data is stored locally > instead of accessed via a network. So this is how my search led me to what > appears to be a dead "lchash" extension. > > -- Dante > > ---------- > D. Dante Lorenso > dante@xxxxxxxxxxx > 972-333-4139 > > -- > PHP General Mailing List (http://www.php.net/) > To unsubscribe, visit: http://www.php.net/unsub.php > > -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php