kernel parameters for improving gluster writes on millions of small writes (long)

hjmangalam at gmail.com (Harry Mangalam) · Thu, 26 Jul 2012 20:47:30 -0700

 I read and am still digesting the kernel tuning parameters mentioned
in John's link.  There's another useful link that expands on some of
the same points here:

The Linux Page Cache and pdflush:
Theory of Operation and Tuning for Write-Heavy Loads
<http://www.westnet.com/~gsmith/content/linux-pdflush.htm>

However, while I digest them, I have a few more observations:  It's
not that the server is slow, it's the gluster native client that is.
So I'm not sure that increasing the perf of the server will help much
at this point.

I wrote a tiny script (burp.pl) that just emits lots of short strings
to stdout like the problem app that originated this discussion (and a
colleague did the same with a C++ app) to verify. If I send stdout to
my gluster fs via the native gluster client, I observe a steady stream
of data at about 14MB/s (this is on a DDR/IPoIB cluster)

$ time `./burp.pl 100   > /gl/hmangala/burp.out && sync`

real    0m29.646s
user    0m17.830s
sys     0m2.000s

In this case, burp.pl is only getting about 70% of a CPU and the
gluster process is getting ~40%.

Here'e the ifstat output for the IB channel (~1 entry/s).  Note the
continuous data out rate of about 14MB/s (and the odd input rate of
about 1MB/s).

       ib1
 KB/s in  KB/s out
   0.00      0.00
   0.00      0.00     < burp  starts
 383.34   5200.51
1039.43  14243.11
1031.59  14132.11
1037.36  14223.32
1044.20  14304.81
1040.40  14288.45
1037.78  14217.64
1042.19  14306.66
1036.54  14200.05
1062.26  14699.87
1072.64  14711.29
1072.87  14694.52
1065.18  14608.67
1074.23  14711.32
1073.26  14711.43
1069.79  14672.60
1066.66  14608.58
1067.68  14647.14
1074.16  14711.48
1069.16  14651.39
1077.19  14767.32
1075.74  14736.75
1068.77  14634.86
1066.81  14625.90
1063.89  14586.81
1064.79  14608.46
1065.37  14583.04
1065.10  14604.44
1063.86  14591.14
 388.41   5323.84   < burp ends
   0.00      0.00
-------------------------------
30460.65   417607.51   totals
(30MB input vs 417MB output)

for the NFS mounted channel;
(mount command:
 mount -o mountproto=tcp,vers=3,noatime,auto -t nfs pbs1ib:/gli /mnt/glnfs

$ time `./burp.pl 100   > /mnt/glnfs/hmangala/burp.out && sync`

real    0m24.704s  < a little faster
user    0m20.710s
sys     0m0.810s

In this case burp.pl gets 100% of a CPU; gluster isn't involved and so
doesn't register.

Here'e the ifstat output for the IB channel:
Note the complete lack of input and no data output until the very end
when it bursts at ~140MB/s.

       ib1
 KB/s in  KB/s out
    0.00      0.00
    0.73      0.00  < burp starts
    0.18      0.00
    0.00      0.00
    1.33      1.88
    0.00      0.00
    0.00      0.00
    0.04      0.00
    0.04      0.00
    0.00      0.00
    0.00      0.00
    0.00      0.00
    0.00      0.00
    0.00      0.00
    0.00      0.00
    0.00      0.00
    0.00      0.00
    0.00      0.00
    0.04      1.29
    0.00      0.00
    0.00      0.00
    0.00      0.00
    0.00      0.00
  314.08  83002.70
  517.10  142239.6
  513.94  141469.4
  123.96  33219.89  < burp ends
    0.04      0.00
--------------------------------
1471.44   399934.76     Totals
(1.47MB input vs 400MB output)

It's hard to argue with that.  NFS is clearly superior / more
efficient on a single process and may be more efficient overall for
the use cases on our clusters.

So why doesn't the gluster native client do client-side caching like
NFS?  It looks like it's explicitly refusing to be cached by the usual
(and usually excellent) Linux mechanisms.
What's the reason for declining this OS advantage on the client side
while providing such a technically sweet solution on the server side?
I'm at a loss to explain this behavior to our technical group.

<previous deleted>
-- 
Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine
[m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487
415 South Circle View Dr, Irvine, CA, 92697 [shipping]
MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps)