RE: file locking...

Robert Cummings <robert@xxxxxxxxxxxxx> · Sun, 01 Mar 2009 12:54:01 -0500

On Sun, 2009-03-01 at 09:09 -0800, bruce wrote:
> hi rob...
> 
> here's the issue in more detail..
> 
> i have multiple processes that are generated/created and run in a
> simultaneous manner. each process wants to get XX number of files from the
> same batch of files... assume i have a batch of 50,000 files. my issue is
> how do i allow each of the processes to get their batch of unique files as
> fast as possible. (the 50K number is an arbotrary number.. my project will
> shrink/expand over time...
> 
> if i dump all the 50K files in the same dir, i can have a lock file that
> would allow each process to sequentially read/write the lock file, and then
> access the dir to get the XX files the process is needing. (each process is
> just looking to get the next batch of files for processing. there's no
> searching based on text in the name of the files. it's a kind of fifo queing
> system) this approach could work, but it's basically sequential, and could
> in theory get into race conditions regarding the lockfile.
> 
> i could also have the process that creates the files, throw the files in
> some kind of multiple directory processes, where i split the 50K files into
> separate dirs and somehow implement logic to allow the cient process to
> fetch the files from the unique/separate dirs.. but this could get ugly.
> 
> so my issue is essentially how can i allow as close to simultaneous access
> by client/child processes to a kind of FIFO of files...
> 
> whatever logic i create for this process, will also be used for the next
> iteration of the project, where i get rid of the files.. and i use some sort
> of database as the informational storage.
> 
> hopefully this provides a little more clarity.

Would I be right in assuming that a process grabs X of the oldest
available files and then begins to work on them. Then the next process
would essentially grab the next X oldest files so on and so forth over
and over again? Also is the file discarded once processed? Would I be
correct in presuming that processing of the files takes longer than
grabbing the files wanted? If so then I would have a single lock upon
which all processes wait. Each process grabs the lock when it can and
then moves X oldest files to a working directory where it can then
process them.

So... directory structure:

    /ROOT
    /ROOT/queue
    /ROOT/work

Locks...

    /ROOT/lock

So let's say you have 500 files:

    /ROOT/queue/file_001.dat
    /ROOT/queue/file_002.dat
    /ROOT/queue/file_003.dat
    ...
    /ROOT/queue/file_499.dat
    /ROOT/queue/file_500.dat

And you have 5 processes... 

    /proc/1
    /proc/2
    /proc/3
    /proc/4
    /proc/5

Now to start all processes try to grab the lock at the same time, by
virtue of lock mechanics only one process gets the lock... let's say for
instance 4.... While 4 has the lock all the other processes go to sleep
for say... 10000 usecs... upon failing to get the lock.

So process 4 transfers file_001.dat through to file_050.dat
into /ROOT/work.

    /ROOT/work/file_001.dat
    /ROOT/work/file_002.dat
    /ROOT/work/file_003.dat
    ...
    /ROOT/work/file_049.dat
    /ROOT/work/file_050.dat

Then it releases the lock and begins processing.... meanwhile the other
processes wake up and try to grab the lock again... this time PID 2 gets
it. It does the same...

    /ROOT/work/file_043.dat
    /ROOT/work/file_044.dat
    /ROOT/work/file_045.dat
    ...
    /ROOT/work/file_049.dat
    /ROOT/work/file_100.dat

    /ROOT/queue/file_101.dat
    /ROOT/queue/file_102.dat
    /ROOT/queue/file_103.dat
    ...
    /ROOT/queue/file_499.dat
    /ROOT/queue/file_500.dat

Now while it was doing that PID 4 finished and all it's files are now
deleted. The first thing it does is try to get the lock so it can get
more... but it's still owned by PID 2 so PID 4 goes to sleep. Once PID 2
gets it's files it releases the lock and off it goes and the cycle
continued. Now there's still an issue with respect to incoming partially
written files. During the incoming process those should be written
elsewhere... lets say /ROOT/incoming. Once writing of the file is
complete it can be moved to /ROOT/queue. Also if you don't want
processes to delete the files you can have yet another
directory /ROOT/processed. So with everything considered here's your
directory structure:

    /ROOT
    /ROOT/incoming
    /ROOT/processed
    /ROOT/queue
    /ROOT/work

One last thing to consider is that if there are no available files on
which to work then you might have your processes sleep a little longer.

Cheers,
Rob.
-- 
http://www.interjinn.com
Application and Templating Framework for PHP

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php