Upgrading gluster installation -- best practices guide?

pkoelle at gmail.com (pkoelle) · Mon, 18 Jan 2010 19:28:02 +0100

Am 15.01.2010 22:09, schrieb Raghavendra G:

Hi Raghavendra,

thanks for taking the time to answer my questions. See below...

> Hi paul,
>
> On Fri, Jan 15, 2010 at 11:32 PM, Paul<pkoelle at gmail.com>  wrote:
>
>> Hi all,
>>
>> We run glusterFS in our testing lab (since 2.0rc1). We are currently using
>> client-side AFR (mirror) with two server and two clients over GigE.
>>
>> Testing is going well except one important point: How do you upgrade with
>> minimal/zero downtime? Here I have several questions:
>>
>> 1. Is the wire-protocol stable during major releases? Can I mix and match
>> all 2.0.x client/servers? If not how do find out which one are compatible?
>>
> we would suggest you to use both client and server from same version of
> glusterfs. Are there any reason for not trying out 3.0?
No specific reason other then not having a clue about the stability of 
3.0. I tested 3.0-git over the weekend and got good results. Consecutive 
writes of 4 100Mb files with dd yield 14Mb/sec, reading back gives 
around 40Mb/sec uncached and 180Mb/sec cached.

#include <benchmark_disclaimer.h>

three:/tmp/glusterfs-git# ./iotest.sh
write some data....(4 time 100MB)
102400000 bytes (102 MB) copied, 6.16526 seconds, 16.6 MB/s
102400000 bytes (102 MB) copied, 6.39431 seconds, 16.0 MB/s
102400000 bytes (102 MB) copied, 8.00558 seconds, 12.8 MB/s
102400000 bytes (102 MB) copied, 7.28237 seconds, 14.1 MB/s

read data back....
102400000 bytes (102 MB) copied, 2.65042 seconds, 38.6 MB/s
102400000 bytes (102 MB) copied, 2.31306 seconds, 44.3 MB/s
102400000 bytes (102 MB) copied, 2.46647 seconds, 41.5 MB/s
102400000 bytes (102 MB) copied, 2.49869 seconds, 41.0 MB/s
deleting data...

The io-cache translator has a cache-size of 256Mb so the working set 
size should exceed this. Network settings are slightly tuned:

net.ipv4.tcp_mtu_probing = 1
net.ipv4.tcp_moderate_rcvbuf = 1
net.core.rmem_max = 108544
net.core.wmem_max = 108544
net.ipv4.tcp_rmem = 4096 87380 4194304
net.ipv4.tcp_wmem = 4096 87380 4194304
net.core.netdev_max_backlog = 500

Below are some results from fio (freshmeat.net/projects/fio). As you can 
see it only does 2Mb/sec but I haven't investigates why is is so much 
slower than dd (sync)?

./fio randread.fio
random-read: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=sync, iodepth=1
Starting 1 process
random-read: Laying out IO file(s) (1 file(s) / 1024MB)
dnet.ipv4.tcp_mtu_probing = 1Jobs: 1 (f=1): [r] [55.7% done] [2157K/0K 
/s] [526/Jobs: 1 (f=1): [r] [64.9% done] [2243K/0K /s] [547/0 iops] [eta 
03m:08s]
random-read: (groupid=0, jobs=1): err= 0: pid=30079
   read : io=1024MB, bw=2022KB/s, iops=505, runt=518502msec
     clat (usec): min=34, max=327696, avg=1969.62, stdev=4750.57
     bw (KB/s) : min=  221, max= 6191, per=100.53%, avg=2032.69, 
stdev=424.40
cpu      : usr=0.20%, sys=0.42%, ctx=262236, majf=0, minf=23
IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
      issued r/w: total=262144/0, short=0/0
      lat (usec): 50=17.93%, 100=5.69%, 250=2.25%, 500=0.09%, 750=0.01%
      lat (usec): 1000=0.01%
      lat (msec): 2=0.02%, 4=73.61%, 10=0.27%, 20=0.05%, 50=0.02%
      lat (msec): 100=0.01%, 250=0.03%, 500=0.01%

Run status group 0 (all jobs):
    READ: io=1024MB, aggrb=2022KB/s, minb=2070KB/s, maxb=2070KB/s, 
mint=518502msec, maxt=518502msec

The config for fio:
[random-read]
  rw=randread
  size=1024m
  directory=/tmp/gluster3_export

BTW: I had terrible results with bigger MTU (6500) and bigger values for 
net.core.(r/w)mem_max. IOPS where in the range of 5/sec.
>
>>
>> 2. Can I export one directory on the servers through multiple instances of
>> glusterfsd (running on different ports)? This would allow to run old and new
>> version in parallel for a short time and do a test from the client.
>>
>
> No.
Too bad. Would they step on each other toes WRT state/metadata? After 
all it's just userspace no? In light of your answers below im still 
searching for a solution to avoid having to shutdown the whole thing.

[snipp]
>> How do YOU handle upgrades, especially wrt downtime and rolling back to a
>> known good configuration?
>>
>
> we follow follow order:
> 1. stop all the services accesing mount point.
> 2. unmount glusterfs clients.
> 3. kill all the servers.
> 4. install new version of glusterfs.
> 5. start glusterfs servers.
> 6. start glusterfs clients.
This sounds like I have to shutdown the whole cluster. Not good.

thanks
  Paul
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>
>
>
>