rick ochoa wrote:
I work for a company that is migrating to a SAN, implementing GFS as the
filesystem. We currently rsync our data from a master server to 5
front-end webservers running Apache and PHP. The rsyncs take an
extraordinarily long time as our content (currently >2.5 million small
files) grows, and does not scale very well as we add more front-end
machines. Our thinking was to put content generated on two inward facing
editorial machines on the SAN as read/write, and our web front-ends as
read-only. All temporary files and logging would write to local disk.
The goal of our initial work was to create this content filesystem,
mount the disks, eliminate the rsyncs, and free up our rsync server for
use as a slave database server.
You may have options that don't require SAN. If you're happy to continue
with DAS (i.e. the cost of SAN doesn't exceed the cost of having
separate disks in each machine with the number of machines you foresee
using in the near future), you may do well with DRBD instead of a SAN.
We used the Luci to configure a node and fencing on a new front-end, and
formatted and configured our disk with it. Our deploy plan was to set
this machine up, put it behind the load-balancer, and have it operate
under normal load for a few days to "burn it in." Once complete, we
would begin to migrate the other four front-ends over to the SAN,
mounted RO after a reinstall of the OS.
This procedure worked without too much issue until we hit the fourth
machine in the cluster, where the cpu load went terrifyingly high and we
got many "D" state httpd processes. Googling "uninterruptible sleep GFS
php" I found references from 2006 about file locking with php and its
use of flock() at the start of a session. The disks were remounted as
"spectator" in an attempt to limit disk I/O on journals. This seemed to
help, but as it was the end of the day seems a false positive. The next
day, CPU load was again incredibly high, and after much flailing about
we went back to local ext3 disks to buy us some time.
If you have lots of I/O on lots of files in few directories, you may be
out of luck. A lot of the overhead of GFS (or any similar FS) is
unavoidable be - the locking between the nodes has to be synchronised
for every file open.
Mounting with noatime,nodiratime,noquota may help a bit, but you will
never see performance with frequent access to lots of small files that
gets anywhere near local disk performance.
There are, however, other options. If DAS is an option for you (and it
sounds like it is), look into GlusterFS. It's performance isn't great
per se (may well be worse than GFS) if you use it the intended way, but
you can use it as a file replication system. If you point your web
directory directly at the file store (if you do this, you must be 100%
sure that NOTHING you do to those files will involve any kind of
writing, or things can get unpredictable and files can get corrupted).
This means you'll get local disk performance with the advantage of not
having to rsync the data. As long as all nodes are connected, the file
changes on the master server will get sent out to the replicas. If you
need to reboot a node, you'll need to ensure that it's consistent, which
is done by forcing a resync by firing off a find to read the first byte
of every file on the mount point. This will force the node to check that
it's files are up to date against other nodes. Note that this will cause
increased load on all the other nodes while it completes, so use with care.
Gordan
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster