hi,can I use the stripe xlator to improve the read/write performance? e.g. use 3 nodes working in a stripe-xlator, can I get a 3 fold benchmark result? Or anyone thought about the parallel I/O improvement in the HPC application? i.e. make the performance scale-up due to the node increasing? BTW: there is a bug report. When try to add a sample xlator into glusterfs-1.3.9, my teammate found a bug in SuperFastHash function, hashfn.c at line 49, the codes below are dummy codes. uint32_t SuperFastHash (const char * data, int32_t len) { //... for (;len > 0; len--) { hash ^= data[len]; return hash; } //...never get invoking... In the test, we found a terrible hash collision, maybe some fixup? Best Wishes~ Yonghao Zhou 2008-05-21 From: gluster-devel-request@xxxxxxxxxx Send Time: 2008-05-21 00:19:37 To: gluster-devel@xxxxxxxxxx CC: Sub: Gluster-devel Digest, Vol 33, Issue 73 Send Gluster-devel mailing list submissions to gluster-devel@xxxxxxxxxx To subscribe or unsubscribe via the World Wide Web, visit http://lists.nongnu.org/mailman/listinfo/gluster-devel or, via email, send a message with subject or body 'help' to gluster-devel-request@xxxxxxxxxx You can reach the person managing the list at gluster-devel-owner@xxxxxxxxxx When replying, please edit your Subject line so it is more specific than "Re: Contents of Gluster-devel digest..." Today's Topics: 1. Re: Single-process (server and client) AFR problems (gordan@xxxxxxxxxx) 2. Re: Single-process (server and client) AFR problems (Krishna Srinivas) 3. Re: Single-process (server and client) AFR problems (gordan@xxxxxxxxxx) ---------------------------------------------------------------------- Message: 1 Date: Tue, 20 May 2008 11:21:34 +0100 (BST) From: gordan@xxxxxxxxxx Subject: Re: [Gluster-devel] Single-process (server and client) AFR problems To: gluster-devel@xxxxxxxxxx Message-ID: <alpine.LRH.1.10.0805201111040.5467@xxxxxxxxxxxxxxxxxxxxxxxxxxx > Content-Type: text/plain; charset="iso-8859-1" Krishna, Is that to say that it's a bug, or am I just using it wrong? Or do I just have a knack for finding dodgy edge cases? Is there a workaround? I have just reconfigured my servers to do 2-process client-side AFR (i.e. the traditional approach), and that works fine. But having single-process server-side AFR would be more efficient, and simplify my config somewhat. Thanks. Gordan On Tue, 20 May 2008, Krishna Srinivas wrote: > In this setup, home1 is sending CHILD_UP event to "server" xlator instead > of the "home" afr xlator. (and home2 is not up) This makes afr think none > of its subvols are up. We will fix it to handle this situation. > > Thanks > Krishna > > On Tue, May 20, 2008 at 2:00 PM, Gordan Bobic <gordan@xxxxxxxxxx > wrote: > This is with release 1.3.9. > > Not much more that seems relevant turns up in the logs with -L DEBUG (DNS > chatter, mentions that the 2nd server isn't talking (glusterfs is switched > off on it because that causes the lock-up). > > This gets logged when I try to cat ~/.bashrc: > > 2008-05-20 09:14:08 D [fuse-bridge.c:375:fuse_entry_cbk] glusterfs-fuse: > 39: (34) /gordan/.bashrc = > > ?0166157 > 2008-05-20 09:14:08 D [inode.c:577:__create_inode] fuse/inode: create > inode(60166157) > 2008-05-20 09:14:08 D [inode.c:367:__active_inode] fuse/inode: activating > inode(60166157), lru=7/102 > 4 > 2008-05-20 09:14:08 D [inode.c:367:__active_inode] fuse/inode: activating > inode(60166157), lru=7/102 > 4 > 2008-05-20 09:14:08 D [fuse-bridge.c:1517:fuse_open] glusterfs-fuse: 40: > OPEN /gordan/.bashrc > 2008-05-20 09:14:08 E [afr.c:1985:afr_selfheal] home: none of the children > are up for locking, retur > ning EIO > 2008-05-20 09:14:08 E [fuse-bridge.c:692:fuse_fd_cbk] glusterfs-fuse: 40: > (12) /gordan/.bashrc = > -1 > ?5) > > On the command line, I get back "Input/output error". I can ls the files, > but cannot actually read them. > > This is with only the first server up. Same happens when I mount home.vol > via fstab or via something like: > glusterfs -f /etc/glusterfs/home.vol /home > > I have also reduced the config (single process, intended for servers) to a > bare minimum (removed posix lock layer), to get to the bottom of it, but I > cannot get any reads to work: > > volume home1 > ????ype storage/posix > ????ption directory /gluster/home > end-volume > > volume home2 > ????ype protocol/client > ????ption transport-type tcp/client > ????ption remote-host 192.168.3.1 > ????ption remote-subvolume home2 > end-volume > > volume home > ????ype cluster/afr > ????ption read-subvolume home1 > ????ubvolumes home1 home2 > end-volume > > volume server > ????ype protocol/server > ????ption transport-type tcp/server > ????ubvolumes home home1 > ????ption auth.ip.home.allow 127.0.0.1,192.168.* > ????ption auth.ip.home1.allow 127.0.0.1,192.168.* > end-volume > > On a related node, if single-process is used, how does GlusterFS know which > volume to mount? For example, if it is trying to mount the protocol/client > volume (home2), the obviously, that won't work because the 2nd server is > not up. If it is mounting the protocol/server volume, then is it trying to > mount home or home1? Or does it mount the outermost volume that _isn't_ a > protocol/[client|server] (which is "home" in this case)? > > Thanks. > > Gordan > > On Tue, 20 May 2008 13:18:07 +0530, Krishna Srinivas > <krishna@xxxxxxxxxxxxx > wrote: > > Gordan, > > > > Which patch set is this? Can you run glusterfs server side with "-L > DEBUG" > > and send the logs? > > > > Thanks > > Krishna > > > > On Tue, May 20, 2008 at 1:56 AM, Gordan Bobic <gordan@xxxxxxxxxx > wrote: > > > Hi, > > > > > > I'm having rather major problems getting single-process AFR to work > > between > > > two servers. When both servers come up, the GlusterFS on both locks up > > > pretty solid. The processes that try to access the FS (including ls) > > seem to > > > get nowhere for a few minutes, and then complete. But something gets > > stuck, > > > and glusterfs cannot be killed even with -9! > > > > > > Another worrying thing is that fuse kernel module ends up having a > > reference > > > count even after glusterfs process gets killed (sometimes killing the > > remote > > > process that isn't locked up on it's host can break the locked-up > > operations > > > and allow for the local glusterfs process to be killed). So fuse then > > cannot > > > be unloaded. > > > > > > This error seems to come up in the logs all the time: > > > 2008-05-19 20:57:17 E [afr.c:1985:afr_selfheal] home: none of the > > children > > > are up for locking, returning EIO > > > 2008-05-19 20:57:17 E [fuse-bridge.c:692:fuse_fd_cbk] glusterfs-fuse: > > 63: > > > (12) /test = > -1 (5) > > > > > > This implies come kind of a locking issue, but the same error and > > conditions > > > also arise when posix locking module is removed. > > > > > > The configs for the two servers are attached. They are almost identical > > to > > > the examples on the glusterfs wiki: > > > > > > http://www.gluster.org/docs/index.php/AFR_single_process > > > > > > What am I doing wrong? Have I run into another bug? > > > > > > Gordan > > > > > > volume home1-store > > > ????ype storage/posix > > > ????ption directory /gluster/home > > > end-volume > > > > > > volume home1 > > > ????ype features/posix-locks > > > ????ubvolumes home1-store > > > end-volume > > > > > > volume home2 > > > ????ype protocol/client > > > ????ption transport-type tcp/client > > > ????ption remote-host 192.168.3.1 > > > ????ption remote-subvolume home2 > > > end-volume > > > > > > volume home > > > ????ype cluster/afr > > > ????ption read-subvolume home1 > > > ????ubvolumes home1 home2 > > > end-volume > > > > > > volume server > > > ????ype protocol/server > > > ????ption transport-type tcp/server > > > ????ubvolumes home home1 > > > ????ption auth.ip.home.allow 127.0.0.1 > > > ????ption auth.ip.home1.allow 192.168.* > > > end-volume > > > > > > volume home2-store > > > ????ype storage/posix > > > ????ption directory /gluster/home > > > end-volume > > > > > > volume home2 > > > ????ype features/posix-locks > > > ????ubvolumes home2-store > > > end-volume > > > > > > volume home1 > > > ????ype protocol/client > > > ????ption transport-type tcp/client > > > ????ption remote-host 192.168.0.1 > > > ????ption remote-subvolume home1 > > > end-volume > > > > > > volume home > > > ????ype cluster/afr > > > ????ption read-subvolume home2 > > > ????ubvolumes home1 home2 > > > end-volume > > > > > > volume server > > > ????ype protocol/server > > > ????ption transport-type tcp/server > > > ????ubvolumes home home2 > > > ????ption auth.ip.home.allow 127.0.0.1 > > > ????ption auth.ip.home2.allow 192.168.* > > > end-volume > > > > > > _______________________________________________ > > > Gluster-devel mailing list > > > Gluster-devel@xxxxxxxxxx > > > http://lists.nongnu.org/mailman/listinfo/gluster-devel > > > > > > > > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxx > http://lists.nongnu.org/mailman/listinfo/gluster-devel > > > > ------------------------------ Message: 2 Date: Tue, 20 May 2008 16:29:58 +0530 From: "Krishna Srinivas" <krishna@xxxxxxxxxxxxx > Subject: Re: [Gluster-devel] Single-process (server and client) AFR problems To: gordan@xxxxxxxxxx Cc: gluster-devel@xxxxxxxxxx Message-ID: <922900d80805200359w60096efcn4a0a4b9b0ea3dcb7@xxxxxxxxxxxxxx > Content-Type: text/plain; charset=ISO-8859-1 Gordon, It is a bug in the way xlators notify UP/DOWN status to other xlators. Thanks for notifying it to us :) Workaround is having a separate process for "protocol/server" xlator. Krishna On Tue, May 20, 2008 at 3:51 PM, <gordan@xxxxxxxxxx > wrote: > Krishna, > > Is that to say that it's a bug, or am I just using it wrong? Or do I just > have a knack for finding dodgy edge cases? > > Is there a workaround? > > I have just reconfigured my servers to do 2-process client-side AFR (i.e. > the traditional approach), and that works fine. But having single-process > server-side AFR would be more efficient, and simplify my config somewhat. > > Thanks. > > Gordan > > > On Tue, 20 May 2008, Krishna Srinivas wrote: > > In this setup, home1 is sending CHILD_UP event to "server" xlator instead > > of the "home" afr xlator. (and home2 is not up) This makes afr think none > > of its subvols are up. We will fix it to handle this situation. > > > > Thanks > > Krishna > > > > On Tue, May 20, 2008 at 2:00 PM, Gordan Bobic <gordan@xxxxxxxxxx > wrote: > > This is with release 1.3.9. > > > > Not much more that seems relevant turns up in the logs with -L DEBUG > > (DNS > > chatter, mentions that the 2nd server isn't talking (glusterfs is > > switched > > off on it because that causes the lock-up). > > > > This gets logged when I try to cat ~/.bashrc: > > > > 2008-05-20 09:14:08 D [fuse-bridge.c:375:fuse_entry_cbk] > > glusterfs-fuse: > > 39: (34) /gordan/.bashrc = > > > 60166157 > > 2008-05-20 09:14:08 D [inode.c:577:__create_inode] fuse/inode: create > > inode(60166157) > > 2008-05-20 09:14:08 D [inode.c:367:__active_inode] fuse/inode: > > activating > > inode(60166157), lru=7/102 > > 4 > > 2008-05-20 09:14:08 D [inode.c:367:__active_inode] fuse/inode: > > activating > > inode(60166157), lru=7/102 > > 4 > > 2008-05-20 09:14:08 D [fuse-bridge.c:1517:fuse_open] glusterfs-fuse: > > 40: > > OPEN /gordan/.bashrc > > 2008-05-20 09:14:08 E [afr.c:1985:afr_selfheal] home: none of the > > children > > are up for locking, retur > > ning EIO > > 2008-05-20 09:14:08 E [fuse-bridge.c:692:fuse_fd_cbk] glusterfs-fuse: 40: > > (12) /gordan/.bashrc = > -1 > > (5) > > > > On the command line, I get back "Input/output error". I can ls the files, > > but cannot actually read them. > > > > This is with only the first server up. Same happens when I mount home.vol > > via fstab or via something like: > > glusterfs -f /etc/glusterfs/home.vol /home > > > > I have also reduced the config (single process, intended for servers) to a > > bare minimum (removed posix lock layer), to get to the bottom of it, but I > > cannot get any reads to work: > > > > volume home1 > > type storage/posix > > option directory /gluster/home > > end-volume > > > > volume home2 > > type protocol/client > > option transport-type tcp/client > > option remote-host 192.168.3.1 > > option remote-subvolume home2 > > end-volume > > > > volume home > > type cluster/afr > > option read-subvolume home1 > > subvolumes home1 home2 > > end-volume > > > > volume server > > type protocol/server > > option transport-type tcp/server > > subvolumes home home1 > > option auth.ip.home.allow 127.0.0.1,192.168.* > > option auth.ip.home1.allow 127.0.0.1,192.168.* > > end-volume > > > > On a related node, if single-process is used, how does GlusterFS know > > which > > volume to mount? For example, if it is trying to mount the protocol/client > > volume (home2), the obviously, that won't work because the 2nd server is > > not up. If it is mounting the protocol/server volume, then is it trying to > > mount home or home1? Or does it mount the outermost volume that _isn't_ a > > protocol/[client|server] (which is "home" in this case)? > > > > Thanks. > > > > Gordan > > > > On Tue, 20 May 2008 13:18:07 +0530, Krishna Srinivas > > <krishna@xxxxxxxxxxxxx > wrote: > > > Gordan, > > > > > > Which patch set is this? Can you run glusterfs server side with "-L > > DEBUG" > > > and send the logs? > > > > > > Thanks > > > Krishna > > > > > > On Tue, May 20, 2008 at 1:56 AM, Gordan Bobic <gordan@xxxxxxxxxx > > > wrote: > > > > Hi, > > > > > > > > I'm having rather major problems getting single-process AFR to work > > > between > > > > two servers. When both servers come up, the GlusterFS on both locks up > > > > pretty solid. The processes that try to access the FS (including ls) > > > seem to > > > > get nowhere for a few minutes, and then complete. But something gets > > > stuck, > > > > and glusterfs cannot be killed even with -9! > > > > > > > > Another worrying thing is that fuse kernel module ends up having a > > > reference > > > > count even after glusterfs process gets killed (sometimes killing the > > > remote > > > > process that isn't locked up on it's host can break the locked-up > > > operations > > > > and allow for the local glusterfs process to be killed). So fuse then > > > cannot > > > > be unloaded. > > > > > > > > This error seems to come up in the logs all the time: > > > > 2008-05-19 20:57:17 E [afr.c:1985:afr_selfheal] home: none of the > > > children > > > > are up for locking, returning EIO > > > > 2008-05-19 20:57:17 E [fuse-bridge.c:692:fuse_fd_cbk] glusterfs-fuse: > > > 63: > > > > (12) /test = > -1 (5) > > > > > > > > This implies come kind of a locking issue, but the same error and > > > conditions > > > > also arise when posix locking module is removed. > > > > > > > > The configs for the two servers are attached. They are almost identical > > > to > > > > the examples on the glusterfs wiki: > > > > > > > > http://www.gluster.org/docs/index.php/AFR_single_process > > > > > > > > What am I doing wrong? Have I run into another bug? > > > > > > > > Gordan > > > > > > > > volume home1-store > > > > type storage/posix > > > > option directory /gluster/home > > > > end-volume > > > > > > > > volume home1 > > > > type features/posix-locks > > > > subvolumes home1-store > > > > end-volume > > > > > > > > volume home2 > > > > type protocol/client > > > > option transport-type tcp/client > > > > option remote-host 192.168.3.1 > > > > option remote-subvolume home2 > > > > end-volume > > > > > > > > volume home > > > > type cluster/afr > > > > option read-subvolume home1 > > > > subvolumes home1 home2 > > > > end-volume > > > > > > > > volume server > > > > type protocol/server > > > > option transport-type tcp/server > > > > subvolumes home home1 > > > > option auth.ip.home.allow 127.0.0.1 > > > > option auth.ip.home1.allow 192.168.* > > > > end-volume > > > > > > > > volume home2-store > > > > type storage/posix > > > > option directory /gluster/home > > > > end-volume > > > > > > > > volume home2 > > > > type features/posix-locks > > > > subvolumes home2-store > > > > end-volume > > > > > > > > volume home1 > > > > type protocol/client > > > > option transport-type tcp/client > > > > option remote-host 192.168.0.1 > > > > option remote-subvolume home1 > > > > end-volume > > > > > > > > volume home > > > > type cluster/afr > > > > option read-subvolume home2 > > > > subvolumes home1 home2 > > > > end-volume > > > > > > > > volume server > > > > type protocol/server > > > > option transport-type tcp/server > > > > subvolumes home home2 > > > > option auth.ip.home.allow 127.0.0.1 > > > > option auth.ip.home2.allow 192.168.* > > > > end-volume > > > > > > > > _______________________________________________ > > > > Gluster-devel mailing list > > > > Gluster-devel@xxxxxxxxxx > > > > http://lists.nongnu.org/mailman/listinfo/gluster-devel > > > > > > > > > > > > > > > > _______________________________________________ > > Gluster-devel mailing list > > Gluster-devel@xxxxxxxxxx > > http://lists.nongnu.org/mailman/listinfo/gluster-devel > > > > > > > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxx > http://lists.nongnu.org/mailman/listinfo/gluster-devel > > ------------------------------ Message: 3 Date: Tue, 20 May 2008 12:43:11 +0100 (BST) From: gordan@xxxxxxxxxx Subject: Re: [Gluster-devel] Single-process (server and client) AFR problems To: gluster-devel@xxxxxxxxxx Message-ID: <alpine.LRH.1.10.0805201240030.5467@xxxxxxxxxxxxxxxxxxxxxxxxxxx > Content-Type: text/plain; charset=us-ascii; format=flowed On Tue, 20 May 2008, Krishna Srinivas wrote: > It is a bug in the way xlators notify UP/DOWN status to other xlators. Ah, OK. I thought I was doing something wrong. > Thanks for notifying it to us :) No problem, glad I could help. :) Is it likely to make it into release 1.3.10? > Workaround is having a separate process for "protocol/server" xlator. Thanks. I split it all up back to client-side AFR. It works, but I think the way I was trying to use it would be more efficient for my setup. Gordan ------------------------------ _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxx http://lists.nongnu.org/mailman/listinfo/gluster-devel End of Gluster-devel Digest, Vol 33, Issue 73 *********************************************