Re: Stale state->fd->inode and race condition with fd_destroy()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We haven't come across this issue so far.. can you post the complete backtrace from your debugger?

Avati

On Sun, Jul 3, 2011 at 1:21 PM, Emmanuel Dreyfus <manu@xxxxxxxxxx> wrote:
Hi

I get a reprodcutbile crash of glusterfsd, running 3.2.1 code. I get it
by running multiple tar -xzf on a client, and after a while, a
glusterfsd on a brick crashes:

Program terminated with signal 11, Segmentation fault.
#0  0xba0d652e in server_rchecksum_cbk (frame=0xbad007d0,
   cookie=0xbaf00300, this=0xba810000, op_ret=-1, op_errno=9,
   weak_checksum=0, strong_checksum=0xb91ffc74 "") at
   server3_1-fops.c:1305

Here is the offending code

       if (op_ret == -1)
               gf_log (this->name, GF_LOG_INFO,
                       "%"PRId64": RCHECKSUM %"PRId64" (%"PRId64") ==>
%"PRId32" (%s)",
                       frame->root->unique, state->resolve.fd_no,
                       state->fd ? state->fd->inode->ino : 0, op_ret,
                       strerror (op_errno));

The problem is state->fd->inode value:

(gdb) print *((server_state_t *)frame->root->state)->fd
$7 = {pid = 2610, flags = 2, refcount = 2, inode_list =
       {next = 0xb9801088, prev = 0xb9801088}, inode = 0xaaaaaaaa,
       lock = {pts_magic = 3735879687, pts_spin = 0 '\0', pts_flags =
       0}, _ctx = 0xbb96b080, xl_count = 8}

inode = 0xaaaaaaaa is set in fd_destroy() to denote a stale object (It
is less fun than using 0xdeadbeef :-)

That suggests a race condition where a thread uses a fd that another
thread destroyed. Of course, the value could be checked at the beginning
of server_rchecksum_cbk(), but I suspect the problem is more widespread
that this. There are many other places in server3_1-fops.c where
state->fd->inode->ino is used.

And should the value be checked at the beginning of
server_rchecksum_cbk() and its friends, or in any gf_log() call, like
this:
       if (op_ret == -1)
               gf_log (this->name, GF_LOG_INFO,
                       "%"PRId64": RCHECKSUM %"PRId64" (%"PRId64") "
                       "==> %"PRId32" (%s)",
                       frame->root->unique, state->resolve.fd_no,
                       state->fd && (state->fd->inode != 0xaaaaaaaa) ?
                       state->fd->inode->ino : 0, op_ret,
                       strerror (op_errno));

FWIW this is a 2x2 replicated and distributed setup.

--
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
manu@xxxxxxxxxx

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxx
https://lists.nongnu.org/mailman/listinfo/gluster-devel


[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux