Re: Possible problem introduced by http://review.gluster.org/15573

Xavier Hernandez <xhernandez@xxxxxxxxxx> · Mon, 24 Oct 2016 08:56:34 +0200

On 21/10/16 15:05, Soumya Koduri wrote:
Hi Xavi,

On 10/21/2016 12:57 PM, Xavier Hernandez wrote:
Looking at the code, I think that the added fd_unref() should only be
called if the fop preparation fails. Otherwise the callback already
unreferences the fd.

Code flow:

* glfs_fsync_async_common() takes an fd ref and calls STACK_WIND passing
that fd.
* Just after that a ref is released.
* When glfs_io_async_cbk() is called another ref is released.

Note that if fop preparation fails, a single fd_unref() is called, but
on success two fd_unref() are called.

Sorry for the inconvenience caused. I think its patch#15573 hasn't
caused the problem but has highlighted another ref leak in the code.

From the code I see that glfs_io_async_cbk() does fd_unref (glfd->fd)
but not the fd passed in STACK_WIND_COOKIE() of the fop.

I think it's the same because the fd passed in STACK_WIND_COOKIE() also 
comes from glfd->fd.

If I take any fop, for eg.,
glfs_fsync_common() {

       fd = glfs_resolve_fd (glfd->fs, subvol, glfd);

}

Here in glfs_resolve_fd ()

fd_t *
__glfs_resolve_fd (struct glfs *fs, xlator_t *subvol, struct glfs_fd *glfd)
{
        fd_t *fd = NULL;

        if (glfd->fd->inode->table->xl == subvol)
                return fd_ref (glfd->fd);

Here we can see that  we are taking extra ref additional to the
ref already taken for glfd->fd. That means the caller of this function
needs to fd_unref(fd) irrespective of subsequent fd_unref (glfd->fd).

I agree here. This additional ref must be released somewhere.

        fd = __glfs_migrate_fd (fs, subvol, glfd);
        if (!fd)
                return NULL;

        if (subvol == fs->active_subvol) {
                fd_unref (glfd->fd);
                glfd->fd = fd_ref (fd);
        }

I think the issue is here during graph_switch(). You have
mentioned as well that the crash happens post graph_switch. Maybe here
we are missing an extra ref to be taken for fd additional to glfd->fd. I
need to look through __glfs_migrate_fd() to confirm that. But these are
my initial thoughts.

I think this is ok. The fd returned by __glfs_migrate_fd() already has a 
reference. We release the fd currently assigned to glfd->fd (that has 
only one reference) and assign the new fd to it, taking an additional 
reference (two in total) like in the previous case.

Xavi

Please let me know your comments.

Thanks,
Soumya

Xavi

On 21/10/16 09:03, Xavier Hernandez wrote:
Hi,

I've just tried Gluster 3.8.5 with Proxmox using gfapi and I
consistently see a crash each time an attempt to connect to the volume
is made.

The backtrace of the crash shows this:

#0  pthread_spin_lock () at
../nptl/sysdeps/x86_64/pthread_spin_lock.S:24
#1  0x00007fe5345776a5 in fd_unref (fd=0x7fe523f7205c) at fd.c:553
#2  0x00007fe53482ba18 in glfs_io_async_cbk (op_ret=<optimized out>,
op_errno=0, frame=<optimized out>, cookie=0x7fe526c67040,
iovec=iovec@entry=0x0, count=count@entry=0)
    at glfs-fops.c:839
#3  0x00007fe53482beed in glfs_fsync_async_cbk (frame=<optimized out>,
cookie=<optimized out>, this=<optimized out>, op_ret=<optimized out>,
op_errno=<optimized out>,
    prebuf=<optimized out>, postbuf=0x7fe5217fe890, xdata=0x0) at
glfs-fops.c:1382
#4  0x00007fe520be2eb7 in ?? () from
/usr/lib/x86_64-linux-gnu/glusterfs/3.8.5/xlator/debug/io-stats.so
#5  0x00007fe5345d118a in default_fsync_cbk (frame=0x7fe52ceef3ac,
cookie=0x560ef95398e8, this=0x8, op_ret=0, op_errno=0, prebuf=0x1,
postbuf=0x7fe5217fe890, xdata=0x0) at defaults.c:1508
#6  0x00007fe5345d118a in default_fsync_cbk (frame=0x7fe52ceef204,
cookie=0x560ef95398e8, this=0x8, op_ret=0, op_errno=0, prebuf=0x1,
postbuf=0x7fe5217fe890, xdata=0x0) at defaults.c:1508
#7  0x00007fe525f78219 in dht_fsync_cbk (frame=0x7fe52ceef2d8,
cookie=0x560ef95398e8, this=0x0, op_ret=0, op_errno=0,
prebuf=0x7fe5217fe820, postbuf=0x7fe5217fe890, xdata=0x0)
    at dht-inode-read.c:873
#8  0x00007fe5261bbc7f in client3_3_fsync_cbk (req=0x7fe525f78030
<dht_fsync_cbk>, iov=0x7fe526c61040, count=8, myframe=0x7fe52ceef130) at
client-rpc-fops.c:975
#9  0x00007fe5343201f0 in rpc_clnt_handle_reply (clnt=0x18,
clnt@entry=0x7fe526fafac0, pollin=0x7fe526c3a1c0) at rpc-clnt.c:791
#10 0x00007fe53432056c in rpc_clnt_notify (trans=<optimized out>,
mydata=0x7fe526fafaf0, event=<optimized out>, data=0x7fe526c3a1c0) at
rpc-clnt.c:962
#11 0x00007fe53431c8a3 in rpc_transport_notify (this=<optimized out>,
event=<optimized out>, data=<optimized out>) at rpc-transport.c:541
#12 0x00007fe5283e8d96 in socket_event_poll_in (this=0x7fe526c69440) at
socket.c:2267
#13 0x00007fe5283eaf37 in socket_event_handler (fd=<optimized out>,
idx=5, data=0x7fe526c69440, poll_in=1, poll_out=0, poll_err=0) at
socket.c:2397
#14 0x00007fe5345ab3f6 in event_dispatch_epoll_handler
(event=0x7fe5217fecc0, event_pool=0x7fe526ca2040) at event-epoll.c:571
#15 event_dispatch_epoll_worker (data=0x7fe527c0f0c0) at
event-epoll.c:674
#16 0x00007fe5324140a4 in start_thread (arg=0x7fe5217ff700) at
pthread_create.c:309
#17 0x00007fe53214962d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:111

The fd being unreferenced contains this:

(gdb) print *fd
$6 = {
  pid = 97649,
  flags = 2,
  refcount = 0,
  inode_list = {
    next = 0x7fe523f7206c,
    prev = 0x7fe523f7206c
  },
  inode = 0x0,
  lock = {
    spinlock = 1,
    mutex = {
      __data = {
        __lock = 1,
        __count = 0,
        __owner = 0,
        __nusers = 0,
        __kind = 0,
        __spins = 0,
        __elision = 0,
        __list = {
          __prev = 0x0,
          __next = 0x0
        }
      },
      __size = "\001", '\000' <repeats 38 times>,
      __align = 1
    }
  },
  _ctx = 0x7fe52ec31c40,
  xl_count = 11,
  lk_ctx = 0x7fe526c126a0,
  anonymous = _gf_false
}

fd->inode is NULL, explaining the cause of the crash. We also see that
fd->refcount is already 0. So I'm wondering if this couldn't be an extra
fd_unref() introduced by that patch.

The crash seems to happen immediately after a graph switch.

Xavi
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel