christopher barry wrote:
On Tue, 2008-04-08 at 09:37 -0500, Wendy Cheng wrote:
gordan@xxxxxxxxxx wrote:
my setup:
6 rh4.5 nodes, gfs1 v6.1, behind redundant LVS directors. I know it's
not new stuff, but corporate standards dictated the rev of rhat.
[...]
I'm noticing huge differences in compile times - or any home file access
really - when doing stuff in the same home directory on the gfs on
different nodes. For instance, the same compile on one node is ~12
minutes - on another it's 18 minutes or more (not running concurrently).
I'm also seeing weird random pauses in writes, like saving a file in vi,
what would normally take less than a second, may take up to 10 seconds.
Anyway, thought I would re-connect to you all and let you know how this
worked out. We ended up scrapping gfs. Not because it's not a great fs,
but because I was using it in a way that was playing to it's weak
points. I had a lot of time and energy invested in it, and it was hard
to let it go. Turns out that connecting to the NetApp filer via nfs is
faster for this workload. I couldn't believe it either, as my bonnie and
dd type tests showed gfs to be faster. But for the use case of large
sets of very small files, and lots of stats going on, gfs simply cannot
compete with NetApp's nfs implementation. GFS is an excellent fs, and it
has it's place in the landscape - but for a development build system,
the NetApp is simply phenomenal.
Assuming you run both configurations (nfs-wafl vs. gfs-san) on the very
same netapp box (?) ...
Both configurations have their pros and cons. The wafl-nfs runs on
native mode that certainly has its advantages - you've made a good
choice but the latter (gfs-on-netapp san) can work well in other
situations. The biggest problem with your original configuration is the
load-balancer. The round-robin (and its variants) scheduling will not
work well if you have a write intensive workload that needs to fight for
locks between multiple GFS nodes. IIRC, there are gfs customers running
on build-compile development environment. They normally assign groups of
users on different GFS nodes, say user id starting with a-e on node 1,
f-j on node2, etc.
One encouraging news from this email is gfs-netapp-san runs well on
bonnie. GFS1 has been struggling with bonnie (large amount of smaller
files within one single node) for a very long time. One of the reasons
is its block allocation tends to get spread across the disk whenever
there are resource group contentions. It is very difficult for linux IO
scheduler to merge these blocks within one single server. When the
workload becomes IO-bound, the locks are subsequently stalled and
everything start to snow-ball after that. Netapp SAN has one more layer
of block allocation indirection within its firmware and its write speed
is "phenomenal" (I'm borrowing your words ;) ), mostly to do with the
NVRAM where it can aggressively cache write data - this helps GFS to
relieve its small file issue quite well.
-- Wendy
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster