Re: crypt xlator bug

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 04/01/2015 03:27 PM, Emmanuel Dreyfus wrote:
Hi

crypt.t was recently broken in NetBSD regression. The glusterfs returns
a node with file type invalid to FUSE, and that breaks the test.

After running a git bisect, I found the offending commit after which
this behavior appeared:
     8a2e2b88fc21dc7879f838d18cd0413dd88023b7
     mem-pool: invalidate memory on GF_FREE to aid debugging

This means the bug has always been there, but this debugging aid
caused it to be reliable.

With the help of an assertion, I can detect when inode->ia_type gets
a corrupted value. It gives me this backtrace where in frame 4,
inode = 0xb9611880 and inode->ia_type = 12475 (which is wrong).
inode value comes from FUSE state->loc->inode and we get it from
frame 20 which is in crypt.c:

#4  0xb9bd2adf in mdc_inode_iatt_get (this=0xbb1df030,
     inode=0xb9611880, iatt=0xbf7fdfa0) at md-cache.c:471
#5  0xb9bd34e1 in mdc_lookup (frame=0xb9aa82b0, this=0xbb1df030,
     loc=0xb9608840, xdata=0x0) at md-cache.c:847
#6  0xb9bc216e in io_stats_lookup (frame=0xb9aa8200, this=0xbb1e0030,
     loc=0xb9608840, xdata=0x0) at io-stats.c:1934
#7  0xbb76755f in default_lookup (frame=0xb9aa8200, this=0xbb1d0030,
     loc=0xb9608840, xdata=0x0) at defaults.c:2138
#8  0xb9ba69cd in meta_lookup (frame=0xb9aa8200, this=0xbb1d0030,
     loc=0xb9608840, xdata=0x0) at meta.c:49
#9  0xbb277365 in fuse_lookup_resume (state=0xb9608830) at fuse-bridge.c:607
#10 0xbb276e07 in fuse_fop_resume (state=0xb9608830) at fuse-bridge.c:569
#11 0xbb274969 in fuse_resolve_done (state=0xb9608830) at fuse-resolve.c:644
#12 0xbb274a29 in fuse_resolve_all (state=0xb9608830) at fuse-resolve.c:671
#13 0xbb274941 in fuse_resolve (state=0xb9608830) at fuse-resolve.c:635
#14 0xbb274a06 in fuse_resolve_all (state=0xb9608830) at fuse-resolve.c:667
#15 0xbb274a8e in fuse_resolve_continue (state=0xb9608830) at fuse-resolve.c:687
#16 0xbb2731f4 in fuse_resolve_entry_cbk (frame=0xb9609688,
     cookie=0xb96140a0, this=0xbb193030, op_ret=0, op_errno=0,
     inode=0xb9611880, buf=0xb961e558, xattr=0xbb18a1a0,
     postparent=0xb961e628) at fuse-resolve.c:81
#17 0xb9bbd0c1 in io_stats_lookup_cbk (frame=0xb96140a0,
     cookie=0xb9614150, this=0xbb1e0030, op_ret=0, op_errno=0,
     inode=0xb9611880, buf=0xb961e558, xdata=0xbb18a1a0,
     postparent=0xb961e628) at io-stats.c:1512
#18 0xb9bd33ff in mdc_lookup_cbk (frame=0xb9614150, cookie=0xb9614410,
     this=0xbb1df030, op_ret=0, op_errno=0,
     inode=0xb9611880, stbuf=0xb961e558, dict=0xbb18a1a0,
      postparent=0xb961e628) at md-cache.c:816
#19 0xb9be2b10 in ioc_lookup_cbk (frame=0xb9614410, cookie=0xb96144c0,
     this=0xbb1de030, op_ret=0, op_errno=0,
     inode=0xb9611880, stbuf=0xb961e558, xdata=0xbb18a1a0,
     postparent=0xb961e628) at io-cache.c:260
#20 0xbb227fb5 in load_file_size (frame=0xb96144c0, cookie=0xb9aa8200,
     this=0xbb1db030, op_ret=0, op_errno=0,
     dict=0xbb18a470, xdata=0x0) at crypt.c:3830

In frame 20:
     case GF_FOP_LOOKUP:
	    STACK_UNWIND_STRICT(lookup,
				frame,
				op_ret,
				op_errno,
				op_ret >= 0 ? local->inode : NULL,
				op_ret >= 0 ? &local->buf : NULL,
				local->xdata,
				op_ret >= 0 &local->postbuf : NULL);
Here is the problem, local->inode is not the 0xb9611880 value anymore,
which means local got corrupted:

(gdb) print local->inode
$2 = (inode_t *) 0x1db030de

I now suspect local has been freed, but I do not find where in crypt.c
this operation is done. There is a local = mem_get0(this->local_pool)
in crypt_alloc_local, but where is that structure freed? There is
no mem_put() call in crypt xlator.
I joined this thread after seeing raghavendra talur's patch which fixed the issue, which seemed extremely odd to me. Just checked this mail from you and local->inode in crypt need not be same as state->loc->inode because, inode_link in fuse_resolve_entry_cbk will give address of already linked inode with same gfid if one exists. I see hardlink related commands in crypt.t so this could be part of looking up extra link may be? which is resolving to older inode that is already linked. It is still some memory problem, but may not be anything to do with crypt. Could you let me know the details of the setup where you saw this issue? I can take a look.

Pranith



_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel




[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux