Re: GFS, Locking, Read-Only, and high processor loads

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



rick ochoa wrote:

I work for a company that is migrating to a SAN, implementing GFS as the filesystem. We currently rsync our data from a master server to 5 front-end webservers running Apache and PHP. The rsyncs take an extraordinarily long time as our content (currently >2.5 million small files) grows, and does not scale very well as we add more front-end machines. Our thinking was to put content generated on two inward facing editorial machines on the SAN as read/write, and our web front-ends as read-only. All temporary files and logging would write to local disk. The goal of our initial work was to create this content filesystem, mount the disks, eliminate the rsyncs, and free up our rsync server for use as a slave database server.

You may have options that don't require SAN. If you're happy to continue with DAS (i.e. the cost of SAN doesn't exceed the cost of having separate disks in each machine with the number of machines you foresee using in the near future), you may do well with DRBD instead of a SAN.

We used the Luci to configure a node and fencing on a new front-end, and formatted and configured our disk with it. Our deploy plan was to set this machine up, put it behind the load-balancer, and have it operate under normal load for a few days to "burn it in." Once complete, we would begin to migrate the other four front-ends over to the SAN, mounted RO after a reinstall of the OS.

This procedure worked without too much issue until we hit the fourth machine in the cluster, where the cpu load went terrifyingly high and we got many "D" state httpd processes. Googling "uninterruptible sleep GFS php" I found references from 2006 about file locking with php and its use of flock() at the start of a session. The disks were remounted as "spectator" in an attempt to limit disk I/O on journals. This seemed to help, but as it was the end of the day seems a false positive. The next day, CPU load was again incredibly high, and after much flailing about we went back to local ext3 disks to buy us some time.

If you have lots of I/O on lots of files in few directories, you may be out of luck. A lot of the overhead of GFS (or any similar FS) is unavoidable be - the locking between the nodes has to be synchronised for every file open.

Mounting with noatime,nodiratime,noquota may help a bit, but you will never see performance with frequent access to lots of small files that gets anywhere near local disk performance.

There are, however, other options. If DAS is an option for you (and it sounds like it is), look into GlusterFS. It's performance isn't great per se (may well be worse than GFS) if you use it the intended way, but you can use it as a file replication system. If you point your web directory directly at the file store (if you do this, you must be 100% sure that NOTHING you do to those files will involve any kind of writing, or things can get unpredictable and files can get corrupted). This means you'll get local disk performance with the advantage of not having to rsync the data. As long as all nodes are connected, the file changes on the master server will get sent out to the replicas. If you need to reboot a node, you'll need to ensure that it's consistent, which is done by forcing a resync by firing off a find to read the first byte of every file on the mount point. This will force the node to check that it's files are up to date against other nodes. Note that this will cause increased load on all the other nodes while it completes, so use with care.

Gordan

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux