Re: glusterfsd crash when using quota without io-threads

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Kinglong,

I opened https://bugzilla.redhat.com/show_bug.cgi?id=1465861 to track this. It seems to be an important issue that went undetected.
Thanks for catching this!!

In my attempts to repro this , I found that on each run some random structures where getting corrupted and running into segfault.
In order to assert that the stack was indeed growing into all the allocated space and beyond, I set a guard page in the end of the allocated stack space (so that we hit a segfault before overusing the space).
Below are the code changes.

@@ -443,6 +443,8 @@ synctask_create (struct syncenv *env, size_t stacksize, synctask_fn_t fn,
         struct synctask *newtask = NULL;
         xlator_t        *this    = THIS;
         int             destroymode = 0;
+        int                     r=0;
+        char                    *v;
 
         VALIDATE_OR_GOTO (env, err);
         VALIDATE_OR_GOTO (fn, err);
@@ -498,9 +500,15 @@ synctask_create (struct syncenv *env, size_t stacksize, synctask_fn_t fn,
                                             gf_common_mt_syncstack);
                 newtask->ctx.uc_stack.ss_size = env->stacksize;
         } else {
-                newtask->stack = GF_CALLOC (1, stacksize,
+               newtask->stack = GF_CALLOC (1, stacksize,
                                             gf_common_mt_syncstack);
                 newtask->ctx.uc_stack.ss_size = stacksize;
+                if (stacksize == 16*1024) {
+                        v = (unsigned long)((char *)(newtask->stack) + 4095) & (~4095);
+                        r = mprotect(v, 4096, PROT_NONE);
+                       gf_msg ("syncop", GF_LOG_ERROR, errno,
+                                LG_MSG_GETCONTEXT_FAILED, "SKU: using 16k stack starting at %p, mprotect returned %d, guard page: %p", newtask->stack, r, v);
+               }
         }
 
(gdb) where
#0  0x00007f8a92c51204 in _dl_lookup_symbol_x () from /lib64/ld-linux-x86-64.so.2
#1  0x00007f8a92c561e3 in _dl_fixup () from /lib64/ld-linux-x86-64.so.2
#2  0x00007f8a92c5dd33 in _dl_runtime_resolve_avx () from /lib64/ld-linux-x86-64.so.2
#3  0x0000000000000000 in ?? ()


(gdb) info reg

rdi            0x7f8a92946188    140233141412232
rbp            0x7f8a800b4000    0x7f8a800b4000
rsp            0x7f8a800b4000    0x7f8a800b4000
r8             0x7f8a92e4ba60    140233146677856

(gdb) layout asm

  >│0x7f8a92c51204 <_dl_lookup_symbol_x+4>          push   %r15                   <== push on stack at the guarded page caused segfault

From the brick log we have,

[syncop.c:515:synctask_create] 0-syncop: SKU: using 16k stack starting at 0x7f8a800b28f0, mprotect returned 0, guard page: 0x7f8a800b3000 [No data available]

Stack grows downward from 0x7f8a800b68f0 to 0x7f8a800b28f0  and the page 0x7f8a800b3000 - 0x7f8a800b4000 is guarded , which is where the segfault hit as seen in gdb.

This confirms that the stack space is not sufficient and overflowing,
I am not sure why we don't hit this in the presence of IO threads though, It may just be that with io threads in graph, we may have some allocated and unused memory which our stack freely grows into. It may just be a silent undetected reuse of some memory.

Regards,
Sanoj



On Tue, Jun 13, 2017 at 10:49 AM, Sanoj Unnikrishnan <sunnikri@xxxxxxxxxx> wrote:
Hi Kinglong
I was reading about makecontext/swapcontext as well,
I did find an article that suggested to use mprotect and force a segfault to check if we have a stack space issue here.
here is the link. http://www.evanjones.ca/software/threading.html.

I don't think i can try this until tomorrow.
thanks and regards,
Sanoj

On Tue, Jun 13, 2017 at 5:58 AM, Kinglong Mee <kinglongmee@xxxxxxxxx> wrote:
Hi Sanoj,

What's your opinion about this problem?

thanks,
Kinglong Mee

On 6/9/2017 17:20, Kinglong Mee wrote:
> Hi Sanoj,
>
> On 6/9/2017 15:48, Sanoj Unnikrishnan wrote:
>> I have not used valgrind before, so I may be wrong here.
>>
>> I think the valgrind_stack_deregister should have been after GF_FREE_STACK.
>> That may explain the instance of invalid_write during stack_destroy calls in after.log.
>
> No. I move it, but the instance of invalid_write also exist.
>
>>
>> There seems to be numerous issues reported in before.log (I am assuming, you did not have the valgrind_stack_register call in it),
>
> Yes, the before.log is test without any code change(but without io-threads).
>
>> From http://valgrind.org/docs/manual/manual-core.html <http://valgrind.org/docs/manual/manual-core.html>, looks like valgrind detects client switching stack only If a memory of > 2MB change in Stack pointer register.
>
> I test with a larger max-stackframe as,
> valgrind  --leak-check=full --max-stackframe=242293216
>
>> Is it possible that since marker is only using 16k, the stack pointer could have been in less than 2MB offset from current Stack Pointer?
>
> Maybe.
> But with io-threads (with adding valgrind_stack_deregister), the valgrind only show some
> "Invalid read/write" about __gf_mem_invalidate.
> The only reason here I think is the stack size (16K) of marker using.
>
> I have not used makecontext/swapcontext before, am i right?
> 1. without swapconext, the stack maybe (just an example)
>    --> io_stats-> quota-> marker-> io-threads ->.....
>
> 2. with swapcontext,
>    --> io_stats-> quota-> marker
>                swithto new stack -> io-threads
>
> After switchto new stack, the new stack size is 16K,
> Does it enough without io-threads?
>
> I don't know what's the behave of io-threads, does it call all sub-xlator using the 16K ? or others?
>
>> It seems unlikely to me since we are allocating the stack from heap.
>>
>> Did you try a run with the valgrind instrumentation, without changing stack size ?
>
> OK.
> The following valgrind-without-stack-change.log is test with adding valgrind_stack_deregister
> (and without io-threads).
>
> thanks,
> Kinglong Mee
>
>> None of this explains the crash though.. We had seen a memory overrun crash in same code path on netbsd earlier but did not follow up then.
>> Will look further into it.
>>
>>
>>
>> On Thu, Jun 8, 2017 at 4:51 PM, Kinglong Mee <kinglongmee@xxxxxxxxx <mailto:kinglongmee@xxxxxxxxx>> wrote:
>>
>>     Maybe it's my fault, I found valgrind can't parse context switch(makecontext/swapcontext) by default.
>>     So, I test with the following patch (tells valgrind new stack by VALGRIND_STACK_DEREGISTER).
>>     With it, only some "Invalid read/write" by __gf_mem_invalidate, Does it right ??
>>     So, there is only one problem, if without io-threads, the stack size is small for marker.
>>     Am I right?
>>
>>     Ps:
>>     valgrind-before.log is the log without the following patch, the valgrind-after.log is with the patch.
>>
>>     ==35656== Invalid write of size 8
>>     ==35656==    at 0x4E8FFD4: __gf_mem_invalidate (mem-pool.c:278)
>>     ==35656==    by 0x4E90313: __gf_free (mem-pool.c:334)
>>     ==35656==    by 0x4EA4E5B: synctask_destroy (syncop.c:394)
>>     ==35656==    by 0x4EA4EDF: synctask_done (syncop.c:412)
>>     ==35656==    by 0x4EA58B3: synctask_switchto (syncop.c:673)
>>     ==35656==    by 0x4EA596B: syncenv_processor (syncop.c:704)
>>     ==35656==    by 0x60B2DC4: start_thread (in /usr/lib64/libpthread-2.17.so <http://libpthread-2.17.so>)
>>     ==35656==    by 0x67A873C: clone (in /usr/lib64/libc-2.17.so <http://libc-2.17.so>)
>>     ==35656==  Address 0x1b104931 is 2,068,017 bytes inside a block of size 2,097,224 alloc'd
>>     ==35656==    at 0x4C29975: calloc (vg_replace_malloc.c:711)
>>     ==35656==    by 0x4E8FA5E: __gf_calloc (mem-pool.c:117)
>>     ==35656==    by 0x4EA52F5: synctask_create (syncop.c:500)
>>     ==35656==    by 0x4EA55AE: synctask_new1 (syncop.c:576)
>>     ==35656==    by 0x143AE0D7: mq_synctask1 (marker-quota.c:1078)
>>     ==35656==    by 0x143AE199: mq_synctask (marker-quota.c:1097)
>>     ==35656==    by 0x143AE6F6: _mq_create_xattrs_txn (marker-quota.c:1236)
>>     ==35656==    by 0x143AE82D: mq_create_xattrs_txn (marker-quota.c:1253)
>>     ==35656==    by 0x143B0DCB: mq_inspect_directory_xattr (marker-quota.c:2027)
>>     ==35656==    by 0x143B13A8: mq_xattr_state (marker-quota.c:2117)
>>     ==35656==    by 0x143A6E80: marker_lookup_cbk (marker.c:2961)
>>     ==35656==    by 0x141811E0: up_lookup_cbk (upcall.c:753)
>>
>>     ----------------------- valgrind ------------------------------------------
>>
>>     Don't forget install valgrind-devel.
>>
>>     diff --git a/libglusterfs/src/syncop.c b/libglusterfs/src/syncop.c
>>     index 00a9b57..97b1de1 100644
>>     --- a/libglusterfs/src/syncop.c
>>     +++ b/libglusterfs/src/syncop.c
>>     @@ -10,6 +10,7 @@
>>
>>      #include "syncop.h"
>>      #include "libglusterfs-messages.h"
>>     +#include <valgrind/valgrind.h>
>>
>>      int
>>      syncopctx_setfsuid (void *uid)
>>     @@ -388,6 +389,8 @@ synctask_destroy (struct synctask *task)
>>              if (!task)
>>                      return;
>>
>>     +VALGRIND_STACK_DEREGISTER(task->valgrind_ret);
>>     +
>>              GF_FREE (task->stack);
>>
>>              if (task->opframe)
>>     @@ -509,6 +512,8 @@ synctask_create (struct syncenv *env, size_t stacksize, sync
>>
>>              newtask->ctx.uc_stack.ss_sp   = newtask->stack;
>>
>>     +       newtask->valgrind_ret = VALGRIND_STACK_REGISTER(newtask->stack, newtask-
>>     +
>>              makecontext (&newtask->ctx, (void (*)(void)) synctask_wrap, 2, newtask)
>>
>>              newtask->state = SYNCTASK_INIT;
>>     diff --git a/libglusterfs/src/syncop.h b/libglusterfs/src/syncop.h
>>     index c2387e6..247325b 100644
>>     --- a/libglusterfs/src/syncop.h
>>     +++ b/libglusterfs/src/syncop.h
>>     @@ -63,6 +63,7 @@ struct synctask {
>>              int                 woken;
>>              int                 slept;
>>              int                 ret;
>>     +       int                 valgrind_ret;
>>
>>              uid_t               uid;
>>              gid_t               gid;
>>     diff --git a/xlators/features/marker/src/marker-quota.c b/xlators/features/marke
>>     index 902b8e5..f3d2507 100644
>>     --- a/xlators/features/marker/src/marker-quota.c
>>     +++ b/xlators/features/marker/src/marker-quota.c
>>     @@ -1075,7 +1075,7 @@ mq_synctask1 (xlator_t *this, synctask_fn_t task, gf_boole
>>              }
>>
>>              if (spawn) {
>>     -                ret = synctask_new1 (this->ctx->env, 1024 * 16, task,
>>     +                ret = synctask_new1 (this->ctx->env, 0, task,
>>                                            mq_synctask_cleanup, NULL, args);
>>                      if (ret) {
>>                              gf_log (this->name, GF_LOG_ERROR, "Failed to spawn "
>>
>>
>>     On 6/8/2017 19:02, Sanoj Unnikrishnan wrote:
>>     > I would still be worried about the Invalid read/write. IMO whether an illegal access causes a crash depends on whether the page is currently mapped.
>>     > So, it could so happen that there is a use after free / use outside of bounds happening in the code  and  it turns out that this location gets mapped in a different (unmapped) page when IO threads is not loaded.
>>     >
>>     > Could you please share the valgrind logs as well.
>>     >
>>     > On Wed, Jun 7, 2017 at 8:22 PM, Kinglong Mee <kinglongmee@xxxxxxxxx <mailto:kinglongmee@xxxxxxxxx> <mailto:kinglongmee@xxxxxxxxx <mailto:kinglongmee@xxxxxxxxx>>> wrote:
>>     >
>>     >     After deleting io-threads from the vols, quota operates (list/set/modify) lets glusterfsd crash.
>>     >     I use it at CentOS 7 (CentOS Linux release 7.3.1611) with glusterfs 3.8.12.
>>     >     It seems the stack corrupt, when testing with the following diff, glusterfsd runs correctly.
>>     >
>>     >     There are two questions as,
>>     >     1. When using valgrind, it shows there are many "Invalid read/write" when with io-threads.
>>     >        Why glusterfsd runs correctly with io-threads? but crash without io-threads?
>>     >
>>     >     2. With the following diff, valgrind also shows many "Invalid read/write" when without io-threads?
>>     >        but no any crash.
>>     >
>>     >     Any comments are welcome.
>>     >
>>     >     Revert http://review.gluster.org/11499 <http://review.gluster.org/11499> <http://review.gluster.org/11499 <http://review.gluster.org/11499>> seems better than the diff.
>>     >
>>     >     diff --git a/xlators/features/marker/src/marker-quota.c b/xlators/features/marke
>>     >     index 902b8e5..f3d2507 100644
>>     >     --- a/xlators/features/marker/src/marker-quota.c
>>     >     +++ b/xlators/features/marker/src/marker-quota.c
>>     >     @@ -1075,7 +1075,7 @@ mq_synctask1 (xlator_t *this, synctask_fn_t task, gf_boole
>>     >              }
>>     >
>>     >              if (spawn) {
>>     >     -                ret = synctask_new1 (this->ctx->env, 1024 * 16, task,
>>     >     +                ret = synctask_new1 (this->ctx->env, 0, task,
>>     >                                            mq_synctask_cleanup, NULL, args);
>>     >                      if (ret) {
>>     >                              gf_log (this->name, GF_LOG_ERROR, "Failed to spawn "
>>     >
>>     >     -----------------------------------test steps ----------------------------------
>>     >     1. gluster volume create gvtest node1:/test/ node2:/test/
>>     >     2. gluster volume start gvtest
>>     >     3. gluster volume quota enable gvtest
>>     >
>>     >     4. "deletes io-threads from all vols"
>>     >     5. reboot node1 and node2.
>>     >     6. sh quota-set.sh
>>     >
>>     >     # cat quota-set.sh
>>     >     gluster volume quota gvtest list
>>     >     gluster volume quota gvtest limit-usage / 10GB
>>     >     gluster volume quota gvtest limit-usage /1234 1GB
>>     >     gluster volume quota gvtest limit-usage /hello 1GB
>>     >     gluster volume quota gvtest limit-usage /test 1GB
>>     >     gluster volume quota gvtest limit-usage /xyz 1GB
>>     >     gluster volume quota gvtest list
>>     >     gluster volume quota gvtest remove /hello
>>     >     gluster volume quota gvtest remove /test
>>     >     gluster volume quota gvtest list
>>     >     gluster volume quota gvtest limit-usage /test 1GB
>>     >     gluster volume quota gvtest remove /xyz
>>     >     gluster volume quota gvtest list
>>     >
>>     >     -----------------------glusterfsd crash without the diff--------------------------------
>>     >
>>     >     /usr/local/lib/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xf5)[0x7f6e1e950af1]
>>     >     /usr/local/lib/libglusterfs.so.0(gf_print_trace+0x21f)[0x7f6e1e956943]
>>     >     /usr/local/sbin/glusterfsd(glusterfsd_print_trace+0x1f)[0x409c83]
>>     >     /lib64/libc.so.6(+0x35250)[0x7f6e1d025250]
>>     >     /lib64/libc.so.6(gsignal+0x37)[0x7f6e1d0251d7]
>>     >     /lib64/libc.so.6(abort+0x148)[0x7f6e1d0268c8]
>>     >     /lib64/libc.so.6(+0x74f07)[0x7f6e1d064f07]
>>     >     /lib64/libc.so.6(+0x7baf5)[0x7f6e1d06baf5]
>>     >     /lib64/libc.so.6(+0x7c3e6)[0x7f6e1d06c3e6]
>>     >     /usr/local/lib/libglusterfs.so.0(__gf_free+0x311)[0x7f6e1e981327]
>>     >     /usr/local/lib/libglusterfs.so.0(synctask_destroy+0x82)[0x7f6e1e995c20]
>>     >     /usr/local/lib/libglusterfs.so.0(synctask_done+0x25)[0x7f6e1e995c47]
>>     >     /usr/local/lib/libglusterfs.so.0(synctask_switchto+0xcf)[0x7f6e1e996585]
>>     >     /usr/local/lib/libglusterfs.so.0(syncenv_processor+0x60)[0x7f6e1e99663d]
>>     >     /lib64/libpthread.so.0(+0x7dc5)[0x7f6e1d7a2dc5]
>>     >     /lib64/libc.so.6(clone+0x6d)[0x7f6e1d0e773d]
>>     >
>>     >     or
>>     >
>>     >     package-string: glusterfs 3.8.12
>>     >     /usr/local/lib/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xf5)[0x7fa15e623af1]
>>     >     /usr/local/lib/libglusterfs.so.0(gf_print_trace+0x21f)[0x7fa15e629943]
>>     >     /usr/local/sbin/glusterfsd(glusterfsd_print_trace+0x1f)[0x409c83]
>>     >     /lib64/libc.so.6(+0x35250)[0x7fa15ccf8250]
>>     >     /lib64/libc.so.6(gsignal+0x37)[0x7fa15ccf81d7]
>>     >     /lib64/libc.so.6(abort+0x148)[0x7fa15ccf98c8]
>>     >     /lib64/libc.so.6(+0x74f07)[0x7fa15cd37f07]
>>     >     /lib64/libc.so.6(+0x7dd4d)[0x7fa15cd40d4d]
>>     >     /lib64/libc.so.6(__libc_calloc+0xb4)[0x7fa15cd43a14]
>>     >     /usr/local/lib/libglusterfs.so.0(__gf_calloc+0xa7)[0x7fa15e653a5f]
>>     >     /usr/local/lib/libglusterfs.so.0(iobref_new+0x2b)[0x7fa15e65875a]
>>     >     /usr/local/lib/glusterfs/3.8.12/rpc-transport/socket.so(+0xa98c)[0x7fa153a8398c]
>>     >     /usr/local/lib/glusterfs/3.8.12/rpc-transport/socket.so(+0xacbc)[0x7fa153a83cbc]
>>     >     /usr/local/lib/glusterfs/3.8.12/rpc-transport/socket.so(+0xad10)[0x7fa153a83d10]
>>     >     /usr/local/lib/glusterfs/3.8.12/rpc-transport/socket.so(+0xb2a7)[0x7fa153a842a7]
>>     >     /usr/local/lib/libglusterfs.so.0(+0x97ea9)[0x7fa15e68eea9]
>>     >     /usr/local/lib/libglusterfs.so.0(+0x982c6)[0x7fa15e68f2c6]
>>     >     /lib64/libpthread.so.0(+0x7dc5)[0x7fa15d475dc5]
>>     >     /lib64/libc.so.6(clone+0x6d)[0x7fa15cdba73d]
>>     >
>>     >     _______________________________________________
>>     >     Gluster-devel mailing list
>>     >     Gluster-devel@xxxxxxxxxxx <mailto:Gluster-devel@gluster.org> <mailto:Gluster-devel@gluster.org <mailto:Gluster-devel@gluster.org>>
>>     >     http://lists.gluster.org/mailman/listinfo/gluster-devel <http://lists.gluster.org/mailman/listinfo/gluster-devel> <http://lists.gluster.org/mailman/listinfo/gluster-devel <http://lists.gluster.org/mailman/listinfo/gluster-devel>>
>>     >
>>     >
>>
>>
>


_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-devel

[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux