Re: GFS, Locking, Read-Only, and high processor loads

Gordan Bobic <gordan@xxxxxxxxxx> · Tue, 13 May 2008 22:08:38 +0100

rick ochoa wrote:

I work for a company that is migrating to a SAN, implementing GFS as the 
filesystem. We currently rsync our data from a master server to 5 
front-end webservers running Apache and PHP. The rsyncs take an 
extraordinarily long time as our content (currently >2.5 million small 
files) grows, and does not scale very well as we add more front-end 
machines. Our thinking was to put content generated on two inward facing 
editorial machines on the SAN as read/write, and our web front-ends as 
read-only. All temporary files and logging would write to local disk. 
The goal of our initial work was to create this content filesystem, 
mount the disks, eliminate the rsyncs, and free up our rsync server for 
use as a slave database server.

You may have options that don't require SAN. If you're happy to continue 
with DAS (i.e. the cost of SAN doesn't exceed the cost of having 
separate disks in each machine with the number of machines you foresee 
using in the near future), you may do well with DRBD instead of a SAN.

We used the Luci to configure a node and fencing on a new front-end, and 
formatted and configured our disk with it. Our deploy plan was to set 
this machine up, put it behind the load-balancer, and have it operate 
under normal load for a few days to "burn it in." Once complete, we 
would begin to migrate the other four front-ends over to the SAN, 
mounted RO after a reinstall of the OS.

This procedure worked without too much issue until we hit the fourth 
machine in the cluster, where the cpu load went terrifyingly high and we 
got many "D" state httpd processes. Googling "uninterruptible sleep GFS 
php" I found references from 2006 about file locking with php and its 
use of flock() at the start of a session. The disks were remounted as 
"spectator" in an attempt to limit disk I/O on journals. This seemed to 
help, but as it was the end of the day seems a false positive. The next 
day, CPU load was again incredibly high, and after much flailing about 
we went back to local ext3 disks to buy us some time.

If you have lots of I/O on lots of files in few directories, you may be 
out of luck. A lot of the overhead of GFS (or any similar FS) is 
unavoidable be - the locking between the nodes has to be synchronised 
for every file open.

Mounting with noatime,nodiratime,noquota may help a bit, but you will 
never see performance with frequent access to lots of small files that 
gets anywhere near local disk performance.

There are, however, other options. If DAS is an option for you (and it 
sounds like it is), look into GlusterFS. It's performance isn't great 
per se (may well be worse than GFS) if you use it the intended way, but 
you can use it as a file replication system. If you point your web 
directory directly at the file store (if you do this, you must be 100% 
sure that NOTHING you do to those files will involve any kind of 
writing, or things can get unpredictable and files can get corrupted). 
This means you'll get local disk performance with the advantage of not 
having to rsync the data. As long as all nodes are connected, the file 
changes on the master server will get sent out to the replicas. If you 
need to reboot a node, you'll need to ensure that it's consistent, which 
is done by forcing a resync by firing off a find to read the first byte 
of every file on the mount point. This will force the node to check that 
it's files are up to date against other nodes. Note that this will cause 
increased load on all the other nodes while it completes, so use with care.

Gordan

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster