Re: glusterfsd crash when using quota without io-threads

Raghavendra Gowdappa <rgowdapp@xxxxxxxxxx> · Wed, 28 Jun 2017 08:34:09 -0400 (EDT)

----- Original Message -----
> From: "Sanoj Unnikrishnan" <sunnikri@xxxxxxxxxx>
> To: "Kinglong Mee" <kinglongmee@xxxxxxxxx>
> Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx>
> Sent: Wednesday, June 28, 2017 5:18:56 PM
> Subject: Re:  glusterfsd crash when using quota without	io-threads
> 
> Hi Kinglong,
> 
> I opened https://bugzilla.redhat.com/show_bug.cgi?id=1465861 to track this.
> It seems to be an important issue that went undetected.
> Thanks for catching this!!
> 
> In my attempts to repro this , I found that on each run some random
> structures where getting corrupted and running into segfault.
> In order to assert that the stack was indeed growing into all the allocated
> space and beyond, I set a guard page in the end of the allocated stack space
> (so that we hit a segfault before overusing the space).
> Below are the code changes.
> 
> @@ -443,6 +443,8 @@ synctask_create (struct syncenv *env, size_t stacksize,
> synctask_fn_t fn,
> struct synctask *newtask = NULL;
> xlator_t *this = THIS;
> int destroymode = 0;
> + int r=0;
> + char *v;
> 
> VALIDATE_OR_GOTO (env, err);
> VALIDATE_OR_GOTO (fn, err);
> @@ -498,9 +500,15 @@ synctask_create (struct syncenv *env, size_t stacksize,
> synctask_fn_t fn,
> gf_common_mt_syncstack);
> newtask->ctx.uc_stack.ss_size = env->stacksize;
> } else {
> - newtask->stack = GF_CALLOC (1, stacksize,
> + newtask->stack = GF_CALLOC (1, stacksize,
> gf_common_mt_syncstack);
> newtask->ctx.uc_stack.ss_size = stacksize;
> + if (stacksize == 16*1024) {
> + v = (unsigned long)((char *)(newtask->stack) + 4095) & (~4095);
> + r = mprotect(v, 4096, PROT_NONE);
> + gf_msg ("syncop", GF_LOG_ERROR, errno,
> + LG_MSG_GETCONTEXT_FAILED, "SKU: using 16k stack starting at %p, mprotect
> returned %d, guard page: %p", newtask->stack, r, v);
> + }
> }
> 
> (gdb) where
> #0 0x00007f8a92c51204 in _dl_lookup_symbol_x () from
> /lib64/ld-linux-x86-64.so.2
> #1 0x00007f8a92c561e3 in _dl_fixup () from /lib64/ld-linux-x86-64.so.2
> #2 0x00007f8a92c5dd33 in _dl_runtime_resolve_avx () from
> /lib64/ld-linux-x86-64.so.2
> #3 0x0000000000000000 in ?? ()
> 
> 
> (gdb) info reg
> 
> rdi 0x7f8a92946188 140233141412232
> rbp 0x7f8a800b4000 0x7f8a800b4000
> rsp 0x7f8a800b4000 0x7f8a800b4000
> r8 0x7f8a92e4ba60 140233146677856
> 
> (gdb) layout asm
> 
> >│0x7f8a92c51204 <_dl_lookup_symbol_x+4> push %r15 <== push on stack at the
> >guarded page caused segfault
> 
> From the brick log we have,
> 
> [syncop.c:515:synctask_create] 0-syncop: SKU: using 16k stack starting at
> 0x7f8a800b28f0, mprotect returned 0, guard page: 0x7f8a800b3000 [No data
> available]
> 
> Stack grows downward from 0x7f8a800b68f0 to 0x7f8a800b28f0 and the page
> 0x7f8a800b3000 - 0x7f8a800b4000 is guarded , which is where the segfault hit
> as seen in gdb.
> 
> This confirms that the stack space is not sufficient and overflowing,
> I am not sure why we don't hit this in the presence of IO threads though, 

Is it a case of too many frames in a single C stack? If that is the case, with io-threads in the graph STACK_WIND/UNWINDS won't result in the growth of a single C stack. In the past we've seen these kind of crashes (for eg., while updating sizes of all ancestors after a write by marker etc).

> It
> may just be that with io threads in graph, we may have some allocated and
> unused memory which our stack freely grows into. It may just be a silent
> undetected reuse of some memory.
> 
> Regards,
> Sanoj
> 
> 
> 
> On Tue, Jun 13, 2017 at 10:49 AM, Sanoj Unnikrishnan < sunnikri@xxxxxxxxxx >
> wrote:
> 
> 
> 
> Hi Kinglong
> I was reading about makecontext/swapcontext as well,
> I did find an article that suggested to use mprotect and force a segfault to
> check if we have a stack space issue here.
> here is the link. http://www.evanjones.ca/software/threading.html .
> 
> I don't think i can try this until tomorrow.
> thanks and regards,
> Sanoj
> 
> On Tue, Jun 13, 2017 at 5:58 AM, Kinglong Mee < kinglongmee@xxxxxxxxx >
> wrote:
> 
> 
> Hi Sanoj,
> 
> What's your opinion about this problem?
> 
> thanks,
> Kinglong Mee
> 
> On 6/9/2017 17:20, Kinglong Mee wrote:
> > Hi Sanoj,
> > 
> > On 6/9/2017 15:48, Sanoj Unnikrishnan wrote:
> >> I have not used valgrind before, so I may be wrong here.
> >> 
> >> I think the valgrind_stack_deregister should have been after
> >> GF_FREE_STACK.
> >> That may explain the instance of invalid_write during stack_destroy calls
> >> in after.log.
> > 
> > No. I move it, but the instance of invalid_write also exist.
> > 
> >> 
> >> There seems to be numerous issues reported in before.log (I am assuming,
> >> you did not have the valgrind_stack_register call in it),
> > 
> > Yes, the before.log is test without any code change(but without
> > io-threads).
> > 
> >> From http://valgrind.org/docs/manual/manual-core.html <
> >> http://valgrind.org/docs/manual/manual-core.html >, looks like valgrind
> >> detects client switching stack only If a memory of > 2MB change in Stack
> >> pointer register.
> > 
> > I test with a larger max-stackframe as,
> > valgrind --leak-check=full --max-stackframe=242293216
> > 
> >> Is it possible that since marker is only using 16k, the stack pointer
> >> could have been in less than 2MB offset from current Stack Pointer?
> > 
> > Maybe.
> > But with io-threads (with adding valgrind_stack_deregister), the valgrind
> > only show some
> > "Invalid read/write" about __gf_mem_invalidate.
> > The only reason here I think is the stack size (16K) of marker using.
> > 
> > I have not used makecontext/swapcontext before, am i right?
> > 1. without swapconext, the stack maybe (just an example)
> > --> io_stats-> quota-> marker-> io-threads ->.....
> > 
> > 2. with swapcontext,
> > --> io_stats-> quota-> marker
> > swithto new stack -> io-threads
> > 
> > After switchto new stack, the new stack size is 16K,
> > Does it enough without io-threads?
> > 
> > I don't know what's the behave of io-threads, does it call all sub-xlator
> > using the 16K ? or others?
> > 
> >> It seems unlikely to me since we are allocating the stack from heap.
> >> 
> >> Did you try a run with the valgrind instrumentation, without changing
> >> stack size ?
> > 
> > OK.
> > The following valgrind-without-stack-change.log is test with adding
> > valgrind_stack_deregister
> > (and without io-threads).
> > 
> > thanks,
> > Kinglong Mee
> > 
> >> None of this explains the crash though.. We had seen a memory overrun
> >> crash in same code path on netbsd earlier but did not follow up then.
> >> Will look further into it.
> >> 
> >> 
> >> 
> >> On Thu, Jun 8, 2017 at 4:51 PM, Kinglong Mee < kinglongmee@xxxxxxxxx
> >> <mailto: kinglongmee@xxxxxxxxx >> wrote:
> >> 
> >> Maybe it's my fault, I found valgrind can't parse context
> >> switch(makecontext/swapcontext) by default.
> >> So, I test with the following patch (tells valgrind new stack by
> >> VALGRIND_STACK_DEREGISTER).
> >> With it, only some "Invalid read/write" by __gf_mem_invalidate, Does it
> >> right ??
> >> So, there is only one problem, if without io-threads, the stack size is
> >> small for marker.
> >> Am I right?
> >> 
> >> Ps:
> >> valgrind-before.log is the log without the following patch, the
> >> valgrind-after.log is with the patch.
> >> 
> >> ==35656== Invalid write of size 8
> >> ==35656== at 0x4E8FFD4: __gf_mem_invalidate (mem-pool.c:278)
> >> ==35656== by 0x4E90313: __gf_free (mem-pool.c:334)
> >> ==35656== by 0x4EA4E5B: synctask_destroy (syncop.c:394)
> >> ==35656== by 0x4EA4EDF: synctask_done (syncop.c:412)
> >> ==35656== by 0x4EA58B3: synctask_switchto (syncop.c:673)
> >> ==35656== by 0x4EA596B: syncenv_processor (syncop.c:704)
> >> ==35656== by 0x60B2DC4: start_thread (in /usr/lib64/ libpthread-2.17.so <
> >> http://libpthread-2.17.so >)
> >> ==35656== by 0x67A873C: clone (in /usr/lib64/ libc-2.17.so <
> >> http://libc-2.17.so >)
> >> ==35656== Address 0x1b104931 is 2,068,017 bytes inside a block of size
> >> 2,097,224 alloc'd
> >> ==35656== at 0x4C29975: calloc (vg_replace_malloc.c:711)
> >> ==35656== by 0x4E8FA5E: __gf_calloc (mem-pool.c:117)
> >> ==35656== by 0x4EA52F5: synctask_create (syncop.c:500)
> >> ==35656== by 0x4EA55AE: synctask_new1 (syncop.c:576)
> >> ==35656== by 0x143AE0D7: mq_synctask1 (marker-quota.c:1078)
> >> ==35656== by 0x143AE199: mq_synctask (marker-quota.c:1097)
> >> ==35656== by 0x143AE6F6: _mq_create_xattrs_txn (marker-quota.c:1236)
> >> ==35656== by 0x143AE82D: mq_create_xattrs_txn (marker-quota.c:1253)
> >> ==35656== by 0x143B0DCB: mq_inspect_directory_xattr (marker-quota.c:2027)
> >> ==35656== by 0x143B13A8: mq_xattr_state (marker-quota.c:2117)
> >> ==35656== by 0x143A6E80: marker_lookup_cbk (marker.c:2961)
> >> ==35656== by 0x141811E0: up_lookup_cbk (upcall.c:753)
> >> 
> >> ----------------------- valgrind
> >> ------------------------------------------
> >> 
> >> Don't forget install valgrind-devel.
> >> 
> >> diff --git a/libglusterfs/src/syncop.c b/libglusterfs/src/syncop.c
> >> index 00a9b57..97b1de1 100644
> >> --- a/libglusterfs/src/syncop.c
> >> +++ b/libglusterfs/src/syncop.c
> >> @@ -10,6 +10,7 @@
> >> 
> >> #include "syncop.h"
> >> #include "libglusterfs-messages.h"
> >> +#include <valgrind/valgrind.h>
> >> 
> >> int
> >> syncopctx_setfsuid (void *uid)
> >> @@ -388,6 +389,8 @@ synctask_destroy (struct synctask *task)
> >> if (!task)
> >> return;
> >> 
> >> +VALGRIND_STACK_DEREGISTER(task->valgrind_ret);
> >> +
> >> GF_FREE (task->stack);
> >> 
> >> if (task->opframe)
> >> @@ -509,6 +512,8 @@ synctask_create (struct syncenv *env, size_t
> >> stacksize, sync
> >> 
> >> newtask->ctx.uc_stack.ss_sp = newtask->stack;
> >> 
> >> + newtask->valgrind_ret = VALGRIND_STACK_REGISTER(newtask->stack, newtask-
> >> +
> >> makecontext (&newtask->ctx, (void (*)(void)) synctask_wrap, 2, newtask)
> >> 
> >> newtask->state = SYNCTASK_INIT;
> >> diff --git a/libglusterfs/src/syncop.h b/libglusterfs/src/syncop.h
> >> index c2387e6..247325b 100644
> >> --- a/libglusterfs/src/syncop.h
> >> +++ b/libglusterfs/src/syncop.h
> >> @@ -63,6 +63,7 @@ struct synctask {
> >> int woken;
> >> int slept;
> >> int ret;
> >> + int valgrind_ret;
> >> 
> >> uid_t uid;
> >> gid_t gid;
> >> diff --git a/xlators/features/marker/src/marker-quota.c
> >> b/xlators/features/marke
> >> index 902b8e5..f3d2507 100644
> >> --- a/xlators/features/marker/src/marker-quota.c
> >> +++ b/xlators/features/marker/src/marker-quota.c
> >> @@ -1075,7 +1075,7 @@ mq_synctask1 (xlator_t *this, synctask_fn_t task,
> >> gf_boole
> >> }
> >> 
> >> if (spawn) {
> >> - ret = synctask_new1 (this->ctx->env, 1024 * 16, task,
> >> + ret = synctask_new1 (this->ctx->env, 0, task,
> >> mq_synctask_cleanup, NULL, args);
> >> if (ret) {
> >> gf_log (this->name, GF_LOG_ERROR, "Failed to spawn "
> >> 
> >> 
> >> On 6/8/2017 19:02, Sanoj Unnikrishnan wrote:
> >> > I would still be worried about the Invalid read/write. IMO whether an
> >> > illegal access causes a crash depends on whether the page is currently
> >> > mapped.
> >> > So, it could so happen that there is a use after free / use outside of
> >> > bounds happening in the code and it turns out that this location gets
> >> > mapped in a different (unmapped) page when IO threads is not loaded.
> >> > 
> >> > Could you please share the valgrind logs as well.
> >> > 
> >> > On Wed, Jun 7, 2017 at 8:22 PM, Kinglong Mee < kinglongmee@xxxxxxxxx
> >> > <mailto: kinglongmee@xxxxxxxxx > <mailto: kinglongmee@xxxxxxxxx
> >> > <mailto: kinglongmee@xxxxxxxxx >>> wrote:
> >> > 
> >> > After deleting io-threads from the vols, quota operates
> >> > (list/set/modify) lets glusterfsd crash.
> >> > I use it at CentOS 7 (CentOS Linux release 7.3.1611) with glusterfs
> >> > 3.8.12.
> >> > It seems the stack corrupt, when testing with the following diff,
> >> > glusterfsd runs correctly.
> >> > 
> >> > There are two questions as,
> >> > 1. When using valgrind, it shows there are many "Invalid read/write"
> >> > when with io-threads.
> >> > Why glusterfsd runs correctly with io-threads? but crash without
> >> > io-threads?
> >> > 
> >> > 2. With the following diff, valgrind also shows many "Invalid
> >> > read/write" when without io-threads?
> >> > but no any crash.
> >> > 
> >> > Any comments are welcome.
> >> > 
> >> > Revert http://review.gluster.org/11499 < http://review.gluster.org/11499
> >> > > < http://review.gluster.org/11499 < http://review.gluster.org/11499
> >> > >> seems better than the diff.
> >> > 
> >> > diff --git a/xlators/features/marker/src/marker-quota.c
> >> > b/xlators/features/marke
> >> > index 902b8e5..f3d2507 100644
> >> > --- a/xlators/features/marker/src/marker-quota.c
> >> > +++ b/xlators/features/marker/src/marker-quota.c
> >> > @@ -1075,7 +1075,7 @@ mq_synctask1 (xlator_t *this, synctask_fn_t task,
> >> > gf_boole
> >> > }
> >> > 
> >> > if (spawn) {
> >> > - ret = synctask_new1 (this->ctx->env, 1024 * 16, task,
> >> > + ret = synctask_new1 (this->ctx->env, 0, task,
> >> > mq_synctask_cleanup, NULL, args);
> >> > if (ret) {
> >> > gf_log (this->name, GF_LOG_ERROR, "Failed to spawn "
> >> > 
> >> > -----------------------------------test steps
> >> > ----------------------------------
> >> > 1. gluster volume create gvtest node1:/test/ node2:/test/
> >> > 2. gluster volume start gvtest
> >> > 3. gluster volume quota enable gvtest
> >> > 
> >> > 4. "deletes io-threads from all vols"
> >> > 5. reboot node1 and node2.
> >> > 6. sh quota-set.sh
> >> > 
> >> > # cat quota-set.sh
> >> > gluster volume quota gvtest list
> >> > gluster volume quota gvtest limit-usage / 10GB
> >> > gluster volume quota gvtest limit-usage /1234 1GB
> >> > gluster volume quota gvtest limit-usage /hello 1GB
> >> > gluster volume quota gvtest limit-usage /test 1GB
> >> > gluster volume quota gvtest limit-usage /xyz 1GB
> >> > gluster volume quota gvtest list
> >> > gluster volume quota gvtest remove /hello
> >> > gluster volume quota gvtest remove /test
> >> > gluster volume quota gvtest list
> >> > gluster volume quota gvtest limit-usage /test 1GB
> >> > gluster volume quota gvtest remove /xyz
> >> > gluster volume quota gvtest list
> >> > 
> >> > -----------------------glusterfsd crash without the
> >> > diff--------------------------------
> >> > 
> >> > /usr/local/lib/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xf5)[0x7f6e1e950af1]
> >> > /usr/local/lib/libglusterfs.so.0(gf_print_trace+0x21f)[0x7f6e1e956943]
> >> > /usr/local/sbin/glusterfsd(glusterfsd_print_trace+0x1f)[0x409c83]
> >> > /lib64/libc.so.6(+0x35250)[0x7f6e1d025250]
> >> > /lib64/libc.so.6(gsignal+0x37)[0x7f6e1d0251d7]
> >> > /lib64/libc.so.6(abort+0x148)[0x7f6e1d0268c8]
> >> > /lib64/libc.so.6(+0x74f07)[0x7f6e1d064f07]
> >> > /lib64/libc.so.6(+0x7baf5)[0x7f6e1d06baf5]
> >> > /lib64/libc.so.6(+0x7c3e6)[0x7f6e1d06c3e6]
> >> > /usr/local/lib/libglusterfs.so.0(__gf_free+0x311)[0x7f6e1e981327]
> >> > /usr/local/lib/libglusterfs.so.0(synctask_destroy+0x82)[0x7f6e1e995c20]
> >> > /usr/local/lib/libglusterfs.so.0(synctask_done+0x25)[0x7f6e1e995c47]
> >> > /usr/local/lib/libglusterfs.so.0(synctask_switchto+0xcf)[0x7f6e1e996585]
> >> > /usr/local/lib/libglusterfs.so.0(syncenv_processor+0x60)[0x7f6e1e99663d]
> >> > /lib64/libpthread.so.0(+0x7dc5)[0x7f6e1d7a2dc5]
> >> > /lib64/libc.so.6(clone+0x6d)[0x7f6e1d0e773d]
> >> > 
> >> > or
> >> > 
> >> > package-string: glusterfs 3.8.12
> >> > /usr/local/lib/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xf5)[0x7fa15e623af1]
> >> > /usr/local/lib/libglusterfs.so.0(gf_print_trace+0x21f)[0x7fa15e629943]
> >> > /usr/local/sbin/glusterfsd(glusterfsd_print_trace+0x1f)[0x409c83]
> >> > /lib64/libc.so.6(+0x35250)[0x7fa15ccf8250]
> >> > /lib64/libc.so.6(gsignal+0x37)[0x7fa15ccf81d7]
> >> > /lib64/libc.so.6(abort+0x148)[0x7fa15ccf98c8]
> >> > /lib64/libc.so.6(+0x74f07)[0x7fa15cd37f07]
> >> > /lib64/libc.so.6(+0x7dd4d)[0x7fa15cd40d4d]
> >> > /lib64/libc.so.6(__libc_calloc+0xb4)[0x7fa15cd43a14]
> >> > /usr/local/lib/libglusterfs.so.0(__gf_calloc+0xa7)[0x7fa15e653a5f]
> >> > /usr/local/lib/libglusterfs.so.0(iobref_new+0x2b)[0x7fa15e65875a]
> >> > /usr/local/lib/glusterfs/3.8.12/rpc-transport/socket.so(+0xa98c)[0x7fa153a8398c]
> >> > /usr/local/lib/glusterfs/3.8.12/rpc-transport/socket.so(+0xacbc)[0x7fa153a83cbc]
> >> > /usr/local/lib/glusterfs/3.8.12/rpc-transport/socket.so(+0xad10)[0x7fa153a83d10]
> >> > /usr/local/lib/glusterfs/3.8.12/rpc-transport/socket.so(+0xb2a7)[0x7fa153a842a7]
> >> > /usr/local/lib/libglusterfs.so.0(+0x97ea9)[0x7fa15e68eea9]
> >> > /usr/local/lib/libglusterfs.so.0(+0x982c6)[0x7fa15e68f2c6]
> >> > /lib64/libpthread.so.0(+0x7dc5)[0x7fa15d475dc5]
> >> > /lib64/libc.so.6(clone+0x6d)[0x7fa15cdba73d]
> >> > 
> >> > _______________________________________________
> >> > Gluster-devel mailing list
> >> > Gluster-devel@xxxxxxxxxxx <mailto: Gluster-devel@xxxxxxxxxxx > <mailto:
> >> > Gluster-devel@xxxxxxxxxxx <mailto: Gluster-devel@xxxxxxxxxxx >>
> >> > http://lists.gluster.org/mailman/listinfo/gluster-devel <
> >> > http://lists.gluster.org/mailman/listinfo/gluster-devel > <
> >> > http://lists.gluster.org/mailman/listinfo/gluster-devel <
> >> > http://lists.gluster.org/mailman/listinfo/gluster-devel >>
> >> > 
> >> > 
> >> 
> >> 
> > 
> 
> 
> 
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel@xxxxxxxxxxx
> http://lists.gluster.org/mailman/listinfo/gluster-devel
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-devel