Re: Possible problem introduced by http://review.gluster.org/15573

Raghavendra Talur <rtalur@xxxxxxxxxx> · Fri, 4 Nov 2016 13:06:45 +0530

On Mon, Oct 24, 2016 at 12:39 PM, Xavier Hernandez <xhernandez@xxxxxxxxxx> wrote:
Hi Soumya,

On 21/10/16 16:15, Soumya Koduri wrote:

On 10/21/2016 06:35 PM, Soumya Koduri wrote:

Hi Xavi,

On 10/21/2016 12:57 PM, Xavier Hernandez wrote:

Looking at the code, I think that the added fd_unref() should only be

called if the fop preparation fails. Otherwise the callback already

unreferences the fd.

Code flow:

* glfs_fsync_async_common() takes an fd ref and calls STACK_WIND passing

that fd.

* Just after that a ref is released.

* When glfs_io_async_cbk() is called another ref is released.

Note that if fop preparation fails, a single fd_unref() is called, but

on success two fd_unref() are called.

Sorry for the inconvenience caused. I think its patch#15573 hasn't

caused the problem but has highlighted another ref leak in the code.

>From the code I see that glfs_io_async_cbk() does fd_unref (glfd->fd)

but not the fd passed in STACK_WIND_COOKIE() of the fop.

If I take any fop, for eg.,

glfs_fsync_common() {

       fd = glfs_resolve_fd (glfd->fs, subvol, glfd);

}

Here in glfs_resolve_fd ()

fd_t *

__glfs_resolve_fd (struct glfs *fs, xlator_t *subvol, struct glfs_fd

*glfd)

{

        fd_t *fd = NULL;

        if (glfd->fd->inode->table->xl == subvol)

                return fd_ref (glfd->fd);

Here we can see that  we are taking extra ref additional to the

ref already taken for glfd->fd. That means the caller of this function

needs to fd_unref(fd) irrespective of subsequent fd_unref (glfd->fd).

        fd = __glfs_migrate_fd (fs, subvol, glfd);

        if (!fd)

                return NULL;

        if (subvol == fs->active_subvol) {

                fd_unref (glfd->fd);

                glfd->fd = fd_ref (fd);

        }

I think the issue is here during graph_switch(). You have

mentioned as well that the crash happens post graph_switch. Maybe here

we are missing an extra ref to be taken for fd additional to glfd->fd. I

need to look through __glfs_migrate_fd() to confirm that. But these are

my initial thoughts.

Looking into this, I think we should fix glfs_io_async_cbk() not to

fd_unref(glfd->fd). glfd->fd should be active though out the lifetime of

glfd (i.e, until it is closed). Thoughts?

I don't know gfapi internals in deep, but at first sight I think this would be the right think to do. Assuming that glfd will keep a reference to the fd until it's destroyed, and that a glfd reference is taken during the lifetime of each request that needs it, the fd_unref() in glfd_io_async_cbk() seems unnecessary. I think it was there just to release the fd acquired if glfs_resolve_fd(), but it's better to place it where it's now.

Another question is if we really need to take an additional reference in glfs_resolve_fd() ?

This answers the first question too, we don't need the additional ref in glfs_resolve_fd() now that we have ref accounting in glfd. This confusion is because earlier there was no ref accounting for glfd and only refs were on fd_t.
No, we don't need this additional reference.

Can an fd returned by this function live more time than the associated glfd in some circumstances ?

Also could you please check if it is the second/subsequent fsync_async()

call which results in crash?

I'll try to test it as soon as possible, but this is on a server that we need to put in production very soon and we have decided to go with fuse for now. We'll have a lot of work to do this week. Once I have some free time I'll build a test environment to check it, probably next week.

I have not been able to test this out completely. Theoretically, I don't see any possibility where fd can outlive a glfd that is pointing to it.  
I have sent a patch[1] that
a. fixes the crash
b. handles the unref in failure cases
c. but still has the duplicate refs with some explanation in commit message.
d. adds a simple test for the same.

I request that we take this patch in and make a release soon as it is affecting many community users and do the cleanup in another patch.

[1] http://review.gluster.org/#/c/15768/ 

Thanks,
Raghavendra Talur 

Xavi

Thanks,

Soumya

Please let me know your comments.

Thanks,

Soumya

Xavi

On 21/10/16 09:03, Xavier Hernandez wrote:

Hi,

I've just tried Gluster 3.8.5 with Proxmox using gfapi and I

consistently see a crash each time an attempt to connect to the volume

is made.

The backtrace of the crash shows this:

#0  pthread_spin_lock () at

../nptl/sysdeps/x86_64/pthread_spin_lock.S:24

#1  0x00007fe5345776a5 in fd_unref (fd=0x7fe523f7205c) at fd.c:553

#2  0x00007fe53482ba18 in glfs_io_async_cbk (op_ret=<optimized out>,

op_errno=0, frame=<optimized out>, cookie=0x7fe526c67040,

iovec=iovec@entry=0x0, count=count@entry=0)

    at glfs-fops.c:839

#3  0x00007fe53482beed in glfs_fsync_async_cbk (frame=<optimized out>,

cookie=<optimized out>, this=<optimized out>, op_ret=<optimized out>,

op_errno=<optimized out>,

    prebuf=<optimized out>, postbuf=0x7fe5217fe890, xdata=0x0) at

glfs-fops.c:1382

#4  0x00007fe520be2eb7 in ?? () from

/usr/lib/x86_64-linux-gnu/glusterfs/3.8.5/xlator/debug/io-stats.so

#5  0x00007fe5345d118a in default_fsync_cbk (frame=0x7fe52ceef3ac,

cookie=0x560ef95398e8, this=0x8, op_ret=0, op_errno=0, prebuf=0x1,

postbuf=0x7fe5217fe890, xdata=0x0) at defaults.c:1508

#6  0x00007fe5345d118a in default_fsync_cbk (frame=0x7fe52ceef204,

cookie=0x560ef95398e8, this=0x8, op_ret=0, op_errno=0, prebuf=0x1,

postbuf=0x7fe5217fe890, xdata=0x0) at defaults.c:1508

#7  0x00007fe525f78219 in dht_fsync_cbk (frame=0x7fe52ceef2d8,

cookie=0x560ef95398e8, this=0x0, op_ret=0, op_errno=0,

prebuf=0x7fe5217fe820, postbuf=0x7fe5217fe890, xdata=0x0)

    at dht-inode-read.c:873

#8  0x00007fe5261bbc7f in client3_3_fsync_cbk (req=0x7fe525f78030

<dht_fsync_cbk>, iov=0x7fe526c61040, count=8,

myframe=0x7fe52ceef130) at

client-rpc-fops.c:975

#9  0x00007fe5343201f0 in rpc_clnt_handle_reply (clnt=0x18,

clnt@entry=0x7fe526fafac0, pollin=0x7fe526c3a1c0) at rpc-clnt.c:791

#10 0x00007fe53432056c in rpc_clnt_notify (trans=<optimized out>,

mydata=0x7fe526fafaf0, event=<optimized out>, data="" at

rpc-clnt.c:962

#11 0x00007fe53431c8a3 in rpc_transport_notify (this=<optimized out>,

event=<optimized out>, data="" out>) at rpc-transport.c:541

#12 0x00007fe5283e8d96 in socket_event_poll_in (this=0x7fe526c69440) at

socket.c:2267

#13 0x00007fe5283eaf37 in socket_event_handler (fd=<optimized out>,

idx=5, data="" poll_in=1, poll_out=0, poll_err=0) at

socket.c:2397

#14 0x00007fe5345ab3f6 in event_dispatch_epoll_handler

(event=0x7fe5217fecc0, event_pool=0x7fe526ca2040) at event-epoll.c:571

#15 event_dispatch_epoll_worker (data="" at

event-epoll.c:674

#16 0x00007fe5324140a4 in start_thread (arg=0x7fe5217ff700) at

pthread_create.c:309

#17 0x00007fe53214962d in clone () at

../sysdeps/unix/sysv/linux/x86_64/clone.S:111

The fd being unreferenced contains this:

(gdb) print *fd

$6 = {

  pid = 97649,

  flags = 2,

  refcount = 0,

  inode_list = {

    next = 0x7fe523f7206c,

    prev = 0x7fe523f7206c

  },

  inode = 0x0,

  lock = {

    spinlock = 1,

    mutex = {

      __data = {

        __lock = 1,

        __count = 0,

        __owner = 0,

        __nusers = 0,

        __kind = 0,

        __spins = 0,

        __elision = 0,

        __list = {

          __prev = 0x0,

          __next = 0x0

        }

      },

      __size = "\001", '\000' <repeats 38 times>,

      __align = 1

    }

  },

  _ctx = 0x7fe52ec31c40,

  xl_count = 11,

  lk_ctx = 0x7fe526c126a0,

  anonymous = _gf_false

}

fd->inode is NULL, explaining the cause of the crash. We also see that

fd->refcount is already 0. So I'm wondering if this couldn't be an

extra

fd_unref() introduced by that patch.

The crash seems to happen immediately after a graph switch.

Xavi

_______________________________________________

Gluster-devel mailing list

Gluster-devel@xxxxxxxxxxx

http://www.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________

Gluster-devel mailing list

Gluster-devel@xxxxxxxxxxx

http://www.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________

Gluster-devel mailing list

Gluster-devel@xxxxxxxxxxx

http://www.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel