self heal makes client loop indefinately

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

I have a HA configuration, 3 nodes with replication *:3 on server side. 3 servers 1 client on a host also doing gluster server. AFR and UNIFY are on server side exactly as described in the wiki's tutorials (Kritical's). I launch my 3 servers, they comunicate and establish all connexions well one with the others. But when I execute the client, mounting the volume report inconsistencies on the server the client connects to, self heal reports a mismatch, seams that self healing is started or at leas requested for these files on all server, but client enters an infinite loop and load average increases over 60, this cient/server host becomes unavailable.

GFS rel 1.3.5 (gcc 3.3, glibc 2.3.6; linux kern 2.6.16.latest)
Fuse 2.7.0-glfs4

Server log (where the client connects to):

2007-10-17 09:18:21 W [unify-self-heal.c:261:unify_sh_readdir_cbk] tbs-clust-unify-or2: found mismatch in mode/uid/gid for .qmail-srv376cc 2007-10-17 09:18:21 W [unify-self-heal.c:261:unify_sh_readdir_cbk] tbs-clust-unify-or2: found mismatch in mode/uid/gid for .qmail-srv376cc-accept-default

Client log:

2007-10-17 09:17:54 D [fuse-bridge.c:423:fuse_lookup] glusterfs-fuse: LOOKUP 1/home (/home) 2007-10-17 09:17:54 D [fuse-bridge.c:346:fuse_entry_cbk] glusterfs-fuse: ENTRY => 3424340 2007-10-17 09:17:54 D [inode.c:559:__create_inode] fuse/inode: create inode(3424340) 2007-10-17 09:17:54 D [inode.c:351:__active_inode] fuse/inode: activating inode(3424340), lru=0/1024 2007-10-17 09:17:54 D [inode.c:381:__passive_inode] fuse/inode: passivating inode(3424340), lru=1/1024 2007-10-17 09:17:54 D [inode.c:308:__destroy_inode] fuse/inode: destroy inode(0) [@0x8092c78] 2007-10-17 09:17:54 D [inode.c:351:__active_inode] fuse/inode: activating inode(3424340), lru=0/1024 2007-10-17 09:17:54 D [fuse-bridge.c:423:fuse_lookup] glusterfs-fuse: LOOKUP 3424340/http2 (/home/http2) 2007-10-17 09:18:19 D [fuse-bridge.c:423:fuse_lookup] glusterfs-fuse: LOOKUP 1/home (/home) 2007-10-17 09:18:21 D [fuse-bridge.c:423:fuse_lookup] glusterfs-fuse: LOOKUP 1/home (/home) 2007-10-17 09:18:44 D [fuse-bridge.c:423:fuse_lookup] glusterfs-fuse: LOOKUP 1/home (/home)

[...] loops for ages

2007-10-17 09:41:16 W [client-protocol.c:210:call_bail] tbs-clust-server-or2: activating bail-out. pending frames = 53. last sent = 2007-10-17 09:38:20. last received = 2007-10-17 09:17:54 transport-timeout = 108 2007-10-17 09:41:16 C [client-protocol.c:218:call_bail] tbs-clust-server-or2: bailing transport 2007-10-17 09:41:16 W [client-protocol.c:4324:client_protocol_cleanup] tbs-clust-server-or2: cleaning up state in transport
object 0x8090e78
2007-10-17 09:41:16 D [tcp.c:131:cont_hand] tcp: forcing poll/read/write to break on blocked socket (if any) 2007-10-17 09:41:16 W [client-protocol.c:4376:client_protocol_cleanup] tbs-clust-server-or2: forced unwinding frame type(0)
op(34) reply=@0x80986b8
2007-10-17 09:41:16 D [fuse-bridge.c:378:fuse_entry_cbk] glusterfs-fuse: ERR => -1 (107) 2007-10-17 09:41:16 D [inode.c:308:__destroy_inode] fuse/inode: destroy inode(0) [@0x8093088]
[...]
2007-10-17 09:45:01 D [fuse-bridge.c:423:fuse_lookup] glusterfs-fuse: LOOKUP 1/home (/home) 2007-10-17 09:45:01 D [fuse-bridge.c:346:fuse_entry_cbk] glusterfs-fuse: ENTRY => 3424340 2007-10-17 09:45:01 D [inode.c:559:__create_inode] fuse/inode: create inode(3424340) 2007-10-17 09:45:01 D [inode.c:351:__active_inode] fuse/inode: activating inode(3424340), lru=0/1024 2007-10-17 09:45:01 D [inode.c:381:__passive_inode] fuse/inode: passivating inode(3424340), lru=1/1024 2007-10-17 09:45:01 D [inode.c:308:__destroy_inode] fuse/inode: destroy inode(0) [@0x80930f0] 2007-10-17 09:45:01 D [inode.c:351:__active_inode] fuse/inode: activating inode(3424340), lru=0/1024 2007-10-17 09:45:01 D [fuse-bridge.c:423:fuse_lookup] glusterfs-fuse: LOOKUP 3424340/http2 (/home/http2) 2007-10-17 09:45:08 D [client-protocol.c:4301:client_protocol_reconnect] tbs-clust-server-or2: breaking reconnect chain 2007-10-17 09:45:26 D [fuse-bridge.c:423:fuse_lookup] glusterfs-fuse: LOOKUP 1/home (/home) 2007-10-17 09:45:27 D [fuse-bridge.c:423:fuse_lookup] glusterfs-fuse: LOOKUP 1/home (/home) 2007-10-17 09:45:51 D [fuse-bridge.c:423:fuse_lookup] glusterfs-fuse: LOOKUP 1/home (/home) 2007-10-17 09:46:16 D [fuse-bridge.c:423:fuse_lookup] glusterfs-fuse: LOOKUP 1/home (/home)

-> and loops again as before


Other 2 server logs:

2007-10-17 09:17:54 D [inode.c:351:__active_inode] tbs-clust-dor3/inode: activating inode(9568336), lru=2/1024 2007-10-17 09:17:54 D [inode.c:381:__passive_inode] tbs-clust-dor3/inode: passivating inode(9568336), lru=3/1024 2007-10-17 09:17:54 D [inode.c:308:__destroy_inode] tbs-clust-dor3/inode: destroy inode(0) [@0xb7c00df0] 2007-10-17 09:17:54 D [inode.c:351:__active_inode] tbs-clust-nsor3/inode: activating inode(9666831), lru=2/1024 2007-10-17 09:17:54 D [inode.c:381:__passive_inode] tbs-clust-nsor3/inode: passivating inode(9666831), lru=3/1024 2007-10-17 09:17:54 D [inode.c:308:__destroy_inode] tbs-clust-nsor3/inode: destroy inode(0) [@0xb7a01150]


NB: the directory where mismatch are found has a large number of files (a few tens of thouthands). When I ls in that directory (on glusterfs mount point); it takes ages to return the answer. There are also many symlinks in that directory (about 20%). I saw a stat-prefetch translator that I dont have presently, can this help in that situation ? Is it a post 1.3.5 functionality ? (1.3.6?)

--
Vincent Régnard
vregnard-at-tbs-internet-dot-com
TBS-internet.com
027 630 5902




[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux