I'm setting up a GFS implementation and was wondering what kind of
tuning parameters I can set for both read-only and read-write.
I work for a company that is migrating to a SAN, implementing GFS as
the filesystem. We currently rsync our data from a master server to 5
front-end webservers running Apache and PHP. The rsyncs take an
extraordinarily long time as our content (currently >2.5 million small
files) grows, and does not scale very well as we add more front-end
machines. Our thinking was to put content generated on two inward
facing editorial machines on the SAN as read/write, and our web front-
ends as read-only. All temporary files and logging would write to
local disk. The goal of our initial work was to create this content
filesystem, mount the disks, eliminate the rsyncs, and free up our
rsync server for use as a slave database server.
We used the Luci to configure a node and fencing on a new front-end,
and formatted and configured our disk with it. Our deploy plan was to
set this machine up, put it behind the load-balancer, and have it
operate under normal load for a few days to "burn it in." Once
complete, we would begin to migrate the other four front-ends over to
the SAN, mounted RO after a reinstall of the OS.
This procedure worked without too much issue until we hit the fourth
machine in the cluster, where the cpu load went terrifyingly high and
we got many "D" state httpd processes. Googling "uninterruptible sleep
GFS php" I found references from 2006 about file locking with php and
its use of flock() at the start of a session. The disks were remounted
as "spectator" in an attempt to limit disk I/O on journals. This
seemed to help, but as it was the end of the day seems a false
positive. The next day, CPU load was again incredibly high, and after
much flailing about we went back to local ext3 disks to buy us some
I'm reading through this list, which is very informative. I'm
attempting to tune our GFS mounts a bit, watching the output of
gfs_tool counters on the filesystems, and looking for any anomalies.
Here's a more detailed description of our setup:
Our hardware configuration consists of a NexSAN SATABoy populated with
8 750GB disks (RAID 5/4.7Tb), and a Brocade Silkworm 3800 for data and
fencing. We purchased QLogic single-port, 4Gb HBAs for our servers.
(more info available on request)
The RAID has 4 partitions, 2 are not mounted:
local - (not mounted) 500GB, extents 4.0MB, block size 4KB,
attributes -wi-ao,
dlm lock protocol - mount /usr/local_san (rw)
this is a copy of /usr/local, which can be synced to all hosts
code - (not mounted) 500GB, extents 4.0MB, block size 4KB,
attributes -wi-ao,
dlm lock protocol - mount /web/code (rw)
this is a copy of /huffpo/web/prod, without the www content and tmp
tmp - 500GB, extents 4.0MB, block size 4KB, attributes -wi-a-,
dlm lock protocol - mount /web/prod/tmp (rw)
this is the temporary directory for front-end web code
www - 2TB, extents 4MB, block size 4KB, attributes -wi-ao,
dlm local protocol - mount /web/prod/www (ro)
read-only content directory, 4 hosts, /etc/fstab options at the time
were ro
read/write on 1 host
we have ~2 more TB available, currently not in use
After reading the list a bit, I've come up with the following tunings
for read-only:
gfs_tool settune /web/prod/www/content glock_purge 80
gfs_tool settune /web/prod/www/content quota_account 0
gfs_tool settune /web/prod/www/content demote_secs 60
gfs_tool settune /web/prod/www/content scand_secs 30
/etc/fstab has spectator,noatime,num_glockd=32 as mount options
And the read/write host has:
gfs_tool settune /web/prod/www/content statfs_fast 1
/etc/fstab has num_glockd=32,noatime as mount options
I've noticed using gfs_tool counters /web/prod/www/content usually has
sub 80k locks for the read/write host running rsync, and sub 10k locks
for the one (and only) read-only host, where previously the number of
locks on all hosts numbered ~80k.
Can I be a bit more aggressive with locks on read-only filesystems
with the current tunings enabled? I'm not sure what the purpose of the
locks on read-only filesystems serve in this instance.
Is there a better configuration for heavy reads on a GFS filesystem
that is read only? vmstat -d gives me for this filesystem:
disk- ------------reads------------ ------------writes----------- -----
sdc 411192 82490 3998862 7402555 607 645 10016 3837
0 695
My big fear is although the systems currently seem to be running
without too much incident, as I add nodes back into the cluster the
number of locks and system load will again run high. As we transition
from using rsync to writing directly onto the SAN, the number of locks
on rw hosts should go down because the spendy directory scans should
be removed.
Are there certain other optimizations I could use to lower the lock
Linux-cluster mailing list