> values were stored, the APC storage began to slow down *dramatically*. I > wasn't certain if APC was using only RAM or was possibly also writing to > disk. Performance tanked so quickly that I set it aside as an option and > moved on. IIRC, i think it is built over shm and there is no disk backing store. > memcached gives no guarantee about data persistence. I need to have a hash > table that will contain all the values I set. They don't need to survive a > server shutdown (don't need to be written to disk), but I can not afford for > the server to throw away values that don't fit into memory. If there is a > way to configure memcached guarantee storage, that might work. True but the lru policy only kicks in lazily. So if you ensure that you never hit near the max allowed limit (-m option), and you store your key-val pairs with no expiry, it will be present till the next restart. So essentially you would have to estimate the value for the -m option to big enough to accommodate all possible key-val pairs (the evictions counter in memcached stats should remain 0). BTW, I have seen this implementation behavior in 1.2.x series but not sure it is necessarily guaranteed in future versions. Ravi On Mon, Jan 25, 2010 at 3:49 PM, D. Dante Lorenso <dante@xxxxxxxxxxx> wrote: > J Ravi Menon wrote: >> >> PHP does expose sys V shared-memory apis (shm_* functions): >> http://us2.php.net/manual/en/book.sem.php > > > I will look into this. I really need a key/value map, though and would > rather not have to write my own on top of SHM. > > >> If you already have apc installed, you could also try: >> http://us2.php.net/manual/en/book.apc.php >> APC also allows you to store user specific data too (it will be in a >> shared memory). > > > I've looked into the apc_store and apc_fetch routines: > http://php.net/manual/en/function.apc-store.php > http://www.php.net/manual/en/function.apc-fetch.php > ... but quickly ran out of memory for APC and though I figured out how to > configure it to use more (adjust shared memory allotment), there were other > problems. I ran into issues with logs complaining about "cache slamming" > and other known bugs with APC version 3.1.3p1. Also, after several million > values were stored, the APC storage began to slow down *dramatically*. I > wasn't certain if APC was using only RAM or was possibly also writing to > disk. Performance tanked so quickly that I set it aside as an option and > moved on. > > >> Haven't tried these myself, so I would do some quick tests to ensure >> if they meet your performance requirements. In theory, it should be >> faster than berkeley-db like solutions (which is also another option >> but it seems something similar like MongoDB was not good enough?). > > > I will run more tests against MongoDB. Initially I tried to use it to store > everything. If I only store my indexes, it might fare better. Certainly, > though, running queries and updates against a remote server will always be > slower than doing the lookups locally in ram. > > >> I am curious to know if someone here has run these tests. Note that >> with memcached installed locally (on the same box running php), it can >> be surprisingly efficient - using pconnect(), caching the handler in >> a static var for a given request cycle etc... > > memcached gives no guarantee about data persistence. I need to have a hash > table that will contain all the values I set. They don't need to survive a > server shutdown (don't need to be written to disk), but I can not afford for > the server to throw away values that don't fit into memory. If there is a > way to configure memcached guarantee storage, that might work. > > -- Dante > > >> On Sun, Jan 24, 2010 at 9:39 AM, D. Dante Lorenso <dante@xxxxxxxxxxx> >> wrote: >>> >>> shiplu wrote: >>>> >>>> On Sun, Jan 24, 2010 at 3:11 AM, D. Dante Lorenso <dante@xxxxxxxxxxx> >>>> wrote: >>>>> >>>>> All, >>>>> >>>>> I'm loading millions of records into a backend PHP cli script that I >>>>> need to build a hash index from to optimize key lookups for data that >>>>> I'm importing into a MySQL database. The problem is that storing this >>>>> data in a PHP array is not very memory efficient and my millions of >>>>> records are consuming about 4-6 GB of ram. >>>>> >>>> What are you storing? An array of row objects?? >>>> In that case storing only the row id is will reduce the memory. >>> >>> I am querying a MySQL database which contains 40 million records and >>> mapping >>> string columns to numeric ids. You might consider it normalizing the >>> data. >>> >>> Then, I am importing a new 40 million records and comparing the new >>> values >>> to the old values. Where the value matches, I update records, but where >>> they do not match, I insert new records, and finally I go back and delete >>> old records. So, the net result is that I have a database with 40 >>> million >>> records that I need to "sync" on a daily basis. >>> >>>> If you are loading full row objects, it will take a lot of memory. >>>> But if you just load the row id values, it will significantly decrease >>>> the memory amount. >>> >>> For what I am trying to do, I just need to map a string value (32 bytes) >>> to >>> a bigint value (8 bytes) in a fast-lookup hash. >>> >>>> Besides, You can load row ids in a chunk by chunk basis. if you have >>>> 10 millions of rows to process. load 10000 rows as a chunk. process >>>> them then load the next chunk. This will significantly reduce memory >>>> usage. >>> >>> When importing the fresh 40 million records, I need to compare each >>> record >>> with 4 different indexes that will map the record to existing other >>> records, >>> or into a "group_id" that the record also belongs to. My current >>> solution >>> uses a trigger in MySQL that will do the lookups inside MySQL, but this >>> is >>> extremely slow. Pre-loading the mysql indexes into PHP ram and >>> processing >>> that was is thousands of times faster. >>> >>> I just need an efficient way to hold my hash tables in PHP ram. PHP >>> arrays >>> are very fast, but like my original post says, they consume way too much >>> ram. >>> >>>> A good algorithm can solve your problem anytime. ;-) >>> >>> It takes about 5-10 minutes to build my hash indexes in PHP ram currently >>> which makes up for the 10,000 x speedup on key lookups that I get later >>> on. >>> I just want to not use the whole 6 GB of ram to do this. I need an >>> efficient hashing API that supports something like: >>> >>> $value = (int) fasthash_get((string) $key); >>> $exists = (bool) fasthash_exists((string) $key); >>> fasthash_set((string) $key, (int) $value); >>> >>> Or ... it feels like a "memcached" api but where the data is stored >>> locally >>> instead of accessed via a network. So this is how my search led me to >>> what >>> appears to be a dead "lchash" extension. >>> >>> -- Dante >>> >>> ---------- >>> D. Dante Lorenso >>> dante@xxxxxxxxxxx >>> 972-333-4139 >>> >>> -- >>> PHP General Mailing List (http://www.php.net/) >>> To unsubscribe, visit: http://www.php.net/unsub.php >>> >>> >> > > > -- > ---------- > D. Dante Lorenso > dante@xxxxxxxxxxx > 972-333-4139 > -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php