Hi
crypt.t was recently broken in NetBSD regression. The glusterfs returns
a node with file type invalid to FUSE, and that breaks the test.
After running a git bisect, I found the offending commit after which
this behavior appeared:
8a2e2b88fc21dc7879f838d18cd0413dd88023b7
mem-pool: invalidate memory on GF_FREE to aid debugging
This means the bug has always been there, but this debugging aid
caused it to be reliable.
With the help of an assertion, I can detect when inode->ia_type gets
a corrupted value. It gives me this backtrace where in frame 4,
inode = 0xb9611880 and inode->ia_type = 12475 (which is wrong).
inode value comes from FUSE state->loc->inode and we get it from
frame 20 which is in crypt.c:
#4 0xb9bd2adf in mdc_inode_iatt_get (this=0xbb1df030,
inode=0xb9611880, iatt=0xbf7fdfa0) at md-cache.c:471
#5 0xb9bd34e1 in mdc_lookup (frame=0xb9aa82b0, this=0xbb1df030,
loc=0xb9608840, xdata=0x0) at md-cache.c:847
#6 0xb9bc216e in io_stats_lookup (frame=0xb9aa8200, this=0xbb1e0030,
loc=0xb9608840, xdata=0x0) at io-stats.c:1934
#7 0xbb76755f in default_lookup (frame=0xb9aa8200, this=0xbb1d0030,
loc=0xb9608840, xdata=0x0) at defaults.c:2138
#8 0xb9ba69cd in meta_lookup (frame=0xb9aa8200, this=0xbb1d0030,
loc=0xb9608840, xdata=0x0) at meta.c:49
#9 0xbb277365 in fuse_lookup_resume (state=0xb9608830) at fuse-bridge.c:607
#10 0xbb276e07 in fuse_fop_resume (state=0xb9608830) at fuse-bridge.c:569
#11 0xbb274969 in fuse_resolve_done (state=0xb9608830) at fuse-resolve.c:644
#12 0xbb274a29 in fuse_resolve_all (state=0xb9608830) at fuse-resolve.c:671
#13 0xbb274941 in fuse_resolve (state=0xb9608830) at fuse-resolve.c:635
#14 0xbb274a06 in fuse_resolve_all (state=0xb9608830) at fuse-resolve.c:667
#15 0xbb274a8e in fuse_resolve_continue (state=0xb9608830) at fuse-resolve.c:687
#16 0xbb2731f4 in fuse_resolve_entry_cbk (frame=0xb9609688,
cookie=0xb96140a0, this=0xbb193030, op_ret=0, op_errno=0,
inode=0xb9611880, buf=0xb961e558, xattr=0xbb18a1a0,
postparent=0xb961e628) at fuse-resolve.c:81
#17 0xb9bbd0c1 in io_stats_lookup_cbk (frame=0xb96140a0,
cookie=0xb9614150, this=0xbb1e0030, op_ret=0, op_errno=0,
inode=0xb9611880, buf=0xb961e558, xdata=0xbb18a1a0,
postparent=0xb961e628) at io-stats.c:1512
#18 0xb9bd33ff in mdc_lookup_cbk (frame=0xb9614150, cookie=0xb9614410,
this=0xbb1df030, op_ret=0, op_errno=0,
inode=0xb9611880, stbuf=0xb961e558, dict=0xbb18a1a0,
postparent=0xb961e628) at md-cache.c:816
#19 0xb9be2b10 in ioc_lookup_cbk (frame=0xb9614410, cookie=0xb96144c0,
this=0xbb1de030, op_ret=0, op_errno=0,
inode=0xb9611880, stbuf=0xb961e558, xdata=0xbb18a1a0,
postparent=0xb961e628) at io-cache.c:260
#20 0xbb227fb5 in load_file_size (frame=0xb96144c0, cookie=0xb9aa8200,
this=0xbb1db030, op_ret=0, op_errno=0,
dict=0xbb18a470, xdata=0x0) at crypt.c:3830
In frame 20:
case GF_FOP_LOOKUP:
STACK_UNWIND_STRICT(lookup,
frame,
op_ret,
op_errno,
op_ret >= 0 ? local->inode : NULL,
op_ret >= 0 ? &local->buf : NULL,
local->xdata,
op_ret >= 0 &local->postbuf : NULL);
Here is the problem, local->inode is not the 0xb9611880 value anymore,
which means local got corrupted:
(gdb) print local->inode
$2 = (inode_t *) 0x1db030de
I now suspect local has been freed, but I do not find where in crypt.c
this operation is done. There is a local = mem_get0(this->local_pool)
in crypt_alloc_local, but where is that structure freed? There is
no mem_put() call in crypt xlator.