hi rob... what you have written is similar to my initial approach... my question, and the reason for posting this to a few different groups.. is to see if someone has pointers/thoughts for something much quicker... this is going to handle processing requests from client apps to a webservice.. the backend of the service has to quickly process the files in the dir as fast as possible to return the data to the web client query... thanks -----Original Message----- From: Robert Cummings [mailto:robert@xxxxxxxxxxxxx] Sent: Sunday, March 01, 2009 9:54 AM To: bruce Cc: php-general@xxxxxxxxxxxxx Subject: RE: file locking... On Sun, 2009-03-01 at 09:09 -0800, bruce wrote: > hi rob... > > here's the issue in more detail.. > > i have multiple processes that are generated/created and run in a > simultaneous manner. each process wants to get XX number of files from the > same batch of files... assume i have a batch of 50,000 files. my issue is > how do i allow each of the processes to get their batch of unique files as > fast as possible. (the 50K number is an arbotrary number.. my project will > shrink/expand over time... > > if i dump all the 50K files in the same dir, i can have a lock file that > would allow each process to sequentially read/write the lock file, and then > access the dir to get the XX files the process is needing. (each process is > just looking to get the next batch of files for processing. there's no > searching based on text in the name of the files. it's a kind of fifo queing > system) this approach could work, but it's basically sequential, and could > in theory get into race conditions regarding the lockfile. > > i could also have the process that creates the files, throw the files in > some kind of multiple directory processes, where i split the 50K files into > separate dirs and somehow implement logic to allow the cient process to > fetch the files from the unique/separate dirs.. but this could get ugly. > > so my issue is essentially how can i allow as close to simultaneous access > by client/child processes to a kind of FIFO of files... > > whatever logic i create for this process, will also be used for the next > iteration of the project, where i get rid of the files.. and i use some sort > of database as the informational storage. > > hopefully this provides a little more clarity. Would I be right in assuming that a process grabs X of the oldest available files and then begins to work on them. Then the next process would essentially grab the next X oldest files so on and so forth over and over again? Also is the file discarded once processed? Would I be correct in presuming that processing of the files takes longer than grabbing the files wanted? If so then I would have a single lock upon which all processes wait. Each process grabs the lock when it can and then moves X oldest files to a working directory where it can then process them. So... directory structure: /ROOT /ROOT/queue /ROOT/work Locks... /ROOT/lock So let's say you have 500 files: /ROOT/queue/file_001.dat /ROOT/queue/file_002.dat /ROOT/queue/file_003.dat ... /ROOT/queue/file_499.dat /ROOT/queue/file_500.dat And you have 5 processes... /proc/1 /proc/2 /proc/3 /proc/4 /proc/5 Now to start all processes try to grab the lock at the same time, by virtue of lock mechanics only one process gets the lock... let's say for instance 4.... While 4 has the lock all the other processes go to sleep for say... 10000 usecs... upon failing to get the lock. So process 4 transfers file_001.dat through to file_050.dat into /ROOT/work. /ROOT/work/file_001.dat /ROOT/work/file_002.dat /ROOT/work/file_003.dat ... /ROOT/work/file_049.dat /ROOT/work/file_050.dat Then it releases the lock and begins processing.... meanwhile the other processes wake up and try to grab the lock again... this time PID 2 gets it. It does the same... /ROOT/work/file_043.dat /ROOT/work/file_044.dat /ROOT/work/file_045.dat ... /ROOT/work/file_049.dat /ROOT/work/file_100.dat /ROOT/queue/file_101.dat /ROOT/queue/file_102.dat /ROOT/queue/file_103.dat ... /ROOT/queue/file_499.dat /ROOT/queue/file_500.dat Now while it was doing that PID 4 finished and all it's files are now deleted. The first thing it does is try to get the lock so it can get more... but it's still owned by PID 2 so PID 4 goes to sleep. Once PID 2 gets it's files it releases the lock and off it goes and the cycle continued. Now there's still an issue with respect to incoming partially written files. During the incoming process those should be written elsewhere... lets say /ROOT/incoming. Once writing of the file is complete it can be moved to /ROOT/queue. Also if you don't want processes to delete the files you can have yet another directory /ROOT/processed. So with everything considered here's your directory structure: /ROOT /ROOT/incoming /ROOT/processed /ROOT/queue /ROOT/work One last thing to consider is that if there are no available files on which to work then you might have your processes sleep a little longer. Cheers, Rob. -- http://www.interjinn.com Application and Templating Framework for PHP -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php