failover and rename(2)

Emmanuel Dreyfus <manu@xxxxxxxxxx> · Wed, 10 Aug 2011 13:15:41 +0000

Hi again

Are rename(2) operation supposed to survive the death of a brick? I
tried simulating an outage by unmounting the filesystem for a brick,
and on the client, a tar(1) which is a heavy reame(2) user compalains
a lot:
tar: Cannot rename usr/src/crypto/dist/ipsec-tools/src/racoon/rfc/draft-ietf-ipsec-nat-t-ike-04.txt.29982d to usr/src/crypto/dist/ipsec-tools/src/racoon/rfc/draft-ietf-ipsec-nat-t-ike-04.txt (No such file or directory)

I reformatted the unmounted filesystem and remounted it with the intent
to have it rebuilt, but self heals does not start. tar(1) carry on 
complaining. Restarting glusterd/glusterfsd on the server does not help. 
the gluster volume replace-brick comand will refuse to work when old
andnew bricks are the same.

The, I do not know if it is related, but client crashes:
Program terminated with signal 11, Segmentation fault.
#0  0xba3925f7 in afr_readdirp_cbk (frame=0xbaf022d0, cookie=0x1, 
    this=0xbb9c5000, op_ret=1, op_errno=2, entries=0xbfbfe088)
    at afr-dir-read.c:592
592                             if ((local->fd->inode == local->fd->inode->table->root)
(gdb) bt             
#0  0xba3925f7 in afr_readdirp_cbk (frame=0xbaf022d0, cookie=0x1, 
    this=0xbb9c5000, op_ret=1, op_errno=2, entries=0xbfbfe088)
    at afr-dir-read.c:592
#1  0xba3e6688 in client3_1_readdirp_cbk (req=0xb980174c, iov=0xb980176c, 
    count=1, myframe=0xbaf02dc0) at client3_1-fops.c:1939
#2  0xbbb8a586 in rpc_clnt_handle_reply (clnt=0xbb962480, pollin=0xb4733ac0)
    at rpc-clnt.c:736
#3  0xbbb8a773 in rpc_clnt_notify (trans=0xbb9a2000, mydata=0xbb9624a0, 
    event=RPC_TRANSPORT_MSG_RECEIVED, data=0x0) at rpc-clnt.c:849
#4  0xbbb8589d in rpc_transport_notify (this=0xaaaaaaaa, 
    event=RPC_TRANSPORT_MSG_RECEIVED, data=0xb4733ac0) at rpc-transport.c:918
#5  0xbba21c8f in socket_event_poll_in (this=0xbb9a2000) at socket.c:1647
#6  0xbba21e5b in socket_event_handler (fd=17, idx=5, data=0xbb9a2000, 
    poll_in=1, poll_out=0, poll_err=0) at socket.c:1762
#7  0xbbbc0e4a in event_dispatch_poll (event_pool=0xbb90e0e0) at event.c:366
#8  0xbbbc099b in event_dispatch (event_pool=0x0) at event.c:956
#9  0x0804cd91 in main (argc=5, argv=0xbfbfe8c4) at glusterfsd.c:1503
(gdb) print local->fd->inode
$1 = (struct _inode *) 0xaaaaaaaa
(gdb) x/16w local->fd
0xb8c0107c:     0x00001808      0x00000000      0x00000003      0xb8c01088
0xb8c0108c:     0xb8c01088      0xaaaaaaaa      0xdead0007      0x00000000
0xb8c0109c:     0x00000000      0xbb909640      0x00000014      0xbabebabe
0xb8c010ac:     0xcafecafe      0x00000001      0x0000751e      0x00000000

Once the client is remounted, self-heal is correctly triggered on server
and everything is fixed.

-- 
Emmanuel Dreyfus
manu@xxxxxxxxxx