booster unfs with cluster/distribute doesn't work...

lslusser at gmail.com (Liam Slusser) · Thu, 23 Jul 2009 04:25:59 -0700

Thanks Shehjar.  I'll give those a try.
liam

On Thu, Jul 23, 2009 at 4:03 AM, Shehjar Tikoo <shehjart at gluster.com> wrote:

> Liam Slusser wrote:
>
>> I've been playing with booster unfs and found that i cannot get it to work
>> with a gluster config that uses cluster/distribute.  I am using Gluster
>> 2.0.3...
>>
>
> Thanks. I've seen the stale handle errors while using both
> replicate and distribute. The fixes are in the repo but
> not part of a release yet. Release 2.0.5 will contain those
> changes. In the mean time, if you're really interested, you'd
> check out the repo as:
>
> $ git clone git://git.sv.gnu.org/gluster.git ./glusterfs
> $ cd glusterfs
> $ git checkout -b release2.0 origin/release-2.0
>
> Also, we've not yet announced it on the list but a customised version
> of unfs3 is available at:
>
> http://ftp.gluster.com/pub/gluster/glusterfs/misc/unfs3/0.5/unfs3-0.9.23booster0.5.tar.gz
>
> It has some bug fixes, performance enhancements and work-arounds
> to improve behaviour with booster.
>
> Some documentation is available at:
> http://www.gluster.org/docs/index.php/Unfs3boosterConfiguration
>
>
> Thanks
> Shehjar
>
>
>
>
>> [root at box01 /]# mount -t nfs store01:/intstore.booster -o
>> wsize=65536,rsize=65536 /mnt/store
>> mount: Stale NFS file handle
>>
>> (just trying it again and sometimes it will mount...)
>>
>> [root at box01 /]# mount -t nfs store01:/store.booster -o
>> wsize=65536,rsize=65536 /mnt/store
>> [root at box01 /]# ls /mnt/store
>> data
>> [root at box01 store]# cd /mnt/store/data
>> -bash: cd: /mnt/store/data/: Stale NFS file handle
>> [root at box01 /]# cd /mnt/store
>> [root at box01 store]# cd data
>> -bash: cd: data/: Stale NFS file handle
>> [root at box01 store]#
>>
>> Sometimes i can get df to show the actual cluster, but most times it gives
>> me nothing.
>>
>> [root at box01 /]# df -h
>> Filesystem            Size  Used Avail Use% Mounted on
>> <....>
>> store01:/store.booster
>>                       90T   49T   42T  54% /mnt/store
>> [root at box01 /]#
>>
>> [root at box01 /]# df -h
>> Filesystem            Size  Used Avail Use% Mounted on
>> <...>
>> store01:/store.booster
>>                         -     -     -   -  /mnt/store
>>
>>
>> However as soon as i remove the cluster/distribute from my gluster client
>> configuration file it works fine.  (Missing 2/3 of the files because my
>> gluster cluster has a "distribute" of 3 volumes per each of the two
>> servers)
>>
>> A strace of unfs during one of the cd commands above outputs:
>>
>> poll([{fd=4, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=21,
>> events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=22,
>> events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=23,
>> events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 4, 2000) = 1 ([{fd=22,
>> revents=POLLIN|POLLRDNORM}])
>> poll([{fd=22, events=POLLIN}], 1, 35000) = 1 ([{fd=22, revents=POLLIN}])
>> read(22,
>>
>> "\200\0\0\230B\307D\234\0\0\0\0\0\0\0\2\0\1\206\243\0\0\0\3\0\0\0\4\0\0\0\1"...,
>> 4000) = 156
>> tgkill(4574, 4576, SIGRT_1)             = 0
>> tgkill(4574, 4575, SIGRT_1)             = 0
>> futex(0x7fff31c7cb20, FUTEX_WAIT_PRIVATE, 1, NULL) = 0
>> setresgid(-1, 0, -1)                    = 0
>> tgkill(4574, 4576, SIGRT_1)             = 0
>> tgkill(4574, 4575, SIGRT_1)             = 0
>> setresuid(-1, 0, -1)                    = 0
>> write(22, "\200\0\0
>> B\307D\234\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0F"..., 36) = 36
>> poll([{fd=4, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=21,
>> events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=22,
>> events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=23,
>> events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 4, 2000) = 1 ([{fd=22,
>> revents=POLLIN|POLLRDNORM}])
>> poll([{fd=22, events=POLLIN}], 1, 35000) = 1 ([{fd=22, revents=POLLIN}])
>> read(22,
>>
>> "\200\0\0\230C\307D\234\0\0\0\0\0\0\0\2\0\1\206\243\0\0\0\3\0\0\0\4\0\0\0\1"...,
>> 4000) = 156
>> tgkill(4574, 4576, SIGRT_1)             = 0
>> tgkill(4574, 4575, SIGRT_1)             = 0
>> setresgid(-1, 0, -1)                    = 0
>> tgkill(4574, 4576, SIGRT_1)             = 0
>> tgkill(4574, 4575, SIGRT_1)             = 0
>> setresuid(-1, 0, -1)                    = 0
>> write(22, "\200\0\0
>> C\307D\234\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0F"..., 36) = 36
>> poll([{fd=4, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=21,
>> events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=22,
>> events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=23,
>> events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 4, 2000 <unfinished ...>
>>
>> With the booster.fstab debug level set a debug, this is all that shows up
>> in
>> the log file:
>>
>> [2009-07-23 02:52:16] D
>> [libglusterfsclient-dentry.c:381:libgf_client_path_lookup]
>> libglusterfsclient: resolved path(/) to 1/1
>> [2009-07-23 02:52:17] D [libglusterfsclient.c:1340:libgf_vmp_search_entry]
>> libglusterfsclient: VMP Entry found: /store.booster/: /store.booster/
>>
>> my /etc/booster.conf
>>
>> /home/gluster/apps/glusterfs-2.0.3/etc/glusterfs/liam.conf /store.booster/
>> glusterfs
>>
>> subvolume=d,logfile=/home/gluster/apps/glusterfs-2.0.3/var/log/glusterfs/d.log,loglevel=DEBUG,attr_timeout=0
>>
>> my /etc/exports
>>
>> /store.booster myclient(rw,no_root_squash)
>>
>> my client gluster config (liam.conf):
>>
>> volume brick1a
>>  type protocol/client
>>  option transport-type tcp
>>  option remote-host server1
>>  option remote-subvolume brick1a
>> end-volume
>>
>> volume brick1b
>>  type protocol/client
>>  option transport-type tcp
>>  option remote-host server1
>>  option remote-subvolume brick1b
>> end-volume
>>
>> volume brick1c
>>  type protocol/client
>>  option transport-type tcp
>>  option remote-host server1
>>  option remote-subvolume brick1c
>> end-volume
>>
>> volume brick2a
>>  type protocol/client
>>  option transport-type tcp
>>  option remote-host server2
>>  option remote-subvolume brick2a
>> end-volume
>>
>> volume brick2b
>>  type protocol/client
>>  option transport-type tcp
>>  option remote-host server2
>>  option remote-subvolume brick2b
>> end-volume
>>
>> volume brick2c
>>  type protocol/client
>>  option transport-type tcp
>>  option remote-host server2
>>  option remote-subvolume brick2c
>> end-volume
>>
>> volume bricks1
>>  type cluster/replicate
>>  subvolumes brick1a brick2a
>> end-volume
>>
>> volume bricks2
>>  type cluster/replicate
>>  subvolumes brick1b brick2b
>> end-volume
>>
>> volume bricks3
>>  type cluster/replicate
>>  subvolumes brick1c brick2c
>> end-volume
>>
>> volume distribute
>>  type cluster/distribute
>>  subvolumes bricks1 bricks2 bricks3
>> end-volume
>>
>> volume readahead
>>  type performance/read-ahead
>>  option page-size 2MB     # unit in bytes
>>  option page-count 16       # cache per file  = (page-count x page-size)
>>  subvolumes distribute
>> end-volume
>>
>> volume cache
>>  type performance/io-cache
>>  option cache-size 256MB
>>  subvolumes readahead
>> end-volume
>>
>> volume d
>>  type performance/write-behind
>>  option cache-size 16MB
>>  option flush-behind on
>>  subvolumes cache
>> end-volume
>>
>> I've tried removing the performance translators with no change.  Once i
>> remove distribute and only connect to one of the three bricks on a server
>> it
>> works perfect.
>>
>> I do have similar cluster that uses replicate but no distribute and it
>> works fine.
>>
>> ideas? This a bug?
>>
>> thanks,
>> liam
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
>>
>
>