Re: glusterfsd crash when using quota without io-threads

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I have not used valgrind before, so I may be wrong here.

I think the valgrind_stack_deregister should have been after GF_FREE_STACK.
That may explain the instance of invalid_write during stack_destroy calls in after.log.

There seems to be numerous issues reported in before.log (I am assuming, you did not have the valgrind_stack_register call in it), 
From http://valgrind.org/docs/manual/manual-core.html, looks like valgrind detects client switching stack only If a memory of > 2MB change in Stack pointer register. Is it possible that since marker is only using 16k, the stack pointer could have been in less than 2MB offset from current Stack Pointer?
It seems unlikely to me since we are allocating the stack from heap.

Did you try a run with the valgrind instrumentation, without changing stack size ?
None of this explains the crash though.. We had seen a memory overrun crash in same code path on netbsd earlier but did not follow up then.
Will look further into it.



On Thu, Jun 8, 2017 at 4:51 PM, Kinglong Mee <kinglongmee@xxxxxxxxx> wrote:
Maybe it's my fault, I found valgrind can't parse context switch(makecontext/swapcontext) by default.
So, I test with the following patch (tells valgrind new stack by VALGRIND_STACK_DEREGISTER).
With it, only some "Invalid read/write" by __gf_mem_invalidate, Does it right ??
So, there is only one problem, if without io-threads, the stack size is small for marker.
Am I right?

Ps:
valgrind-before.log is the log without the following patch, the valgrind-after.log is with the patch.

==35656== Invalid write of size 8
==35656==    at 0x4E8FFD4: __gf_mem_invalidate (mem-pool.c:278)
==35656==    by 0x4E90313: __gf_free (mem-pool.c:334)
==35656==    by 0x4EA4E5B: synctask_destroy (syncop.c:394)
==35656==    by 0x4EA4EDF: synctask_done (syncop.c:412)
==35656==    by 0x4EA58B3: synctask_switchto (syncop.c:673)
==35656==    by 0x4EA596B: syncenv_processor (syncop.c:704)
==35656==    by 0x60B2DC4: start_thread (in /usr/lib64/libpthread-2.17.so)
==35656==    by 0x67A873C: clone (in /usr/lib64/libc-2.17.so)
==35656==  Address 0x1b104931 is 2,068,017 bytes inside a block of size 2,097,224 alloc'd
==35656==    at 0x4C29975: calloc (vg_replace_malloc.c:711)
==35656==    by 0x4E8FA5E: __gf_calloc (mem-pool.c:117)
==35656==    by 0x4EA52F5: synctask_create (syncop.c:500)
==35656==    by 0x4EA55AE: synctask_new1 (syncop.c:576)
==35656==    by 0x143AE0D7: mq_synctask1 (marker-quota.c:1078)
==35656==    by 0x143AE199: mq_synctask (marker-quota.c:1097)
==35656==    by 0x143AE6F6: _mq_create_xattrs_txn (marker-quota.c:1236)
==35656==    by 0x143AE82D: mq_create_xattrs_txn (marker-quota.c:1253)
==35656==    by 0x143B0DCB: mq_inspect_directory_xattr (marker-quota.c:2027)
==35656==    by 0x143B13A8: mq_xattr_state (marker-quota.c:2117)
==35656==    by 0x143A6E80: marker_lookup_cbk (marker.c:2961)
==35656==    by 0x141811E0: up_lookup_cbk (upcall.c:753)

----------------------- valgrind ------------------------------------------

Don't forget install valgrind-devel.

diff --git a/libglusterfs/src/syncop.c b/libglusterfs/src/syncop.c
index 00a9b57..97b1de1 100644
--- a/libglusterfs/src/syncop.c
+++ b/libglusterfs/src/syncop.c
@@ -10,6 +10,7 @@

 #include "syncop.h"
 #include "libglusterfs-messages.h"
+#include <valgrind/valgrind.h>

 int
 syncopctx_setfsuid (void *uid)
@@ -388,6 +389,8 @@ synctask_destroy (struct synctask *task)
         if (!task)
                 return;

+VALGRIND_STACK_DEREGISTER(task->valgrind_ret);
+
         GF_FREE (task->stack);

         if (task->opframe)
@@ -509,6 +512,8 @@ synctask_create (struct syncenv *env, size_t stacksize, sync

         newtask->ctx.uc_stack.ss_sp   = newtask->stack;

+       newtask->valgrind_ret = VALGRIND_STACK_REGISTER(newtask->stack, newtask-
+
         makecontext (&newtask->ctx, (void (*)(void)) synctask_wrap, 2, newtask)

         newtask->state = SYNCTASK_INIT;
diff --git a/libglusterfs/src/syncop.h b/libglusterfs/src/syncop.h
index c2387e6..247325b 100644
--- a/libglusterfs/src/syncop.h
+++ b/libglusterfs/src/syncop.h
@@ -63,6 +63,7 @@ struct synctask {
         int                 woken;
         int                 slept;
         int                 ret;
+       int                 valgrind_ret;

         uid_t               uid;
         gid_t               gid;
diff --git a/xlators/features/marker/src/marker-quota.c b/xlators/features/marke
index 902b8e5..f3d2507 100644
--- a/xlators/features/marker/src/marker-quota.c
+++ b/xlators/features/marker/src/marker-quota.c
@@ -1075,7 +1075,7 @@ mq_synctask1 (xlator_t *this, synctask_fn_t task, gf_boole
         }

         if (spawn) {
-                ret = synctask_new1 (this->ctx->env, 1024 * 16, task,
+                ret = synctask_new1 (this->ctx->env, 0, task,
                                       mq_synctask_cleanup, NULL, args);
                 if (ret) {
                         gf_log (this->name, GF_LOG_ERROR, "Failed to spawn "


On 6/8/2017 19:02, Sanoj Unnikrishnan wrote:
> I would still be worried about the Invalid read/write. IMO whether an illegal access causes a crash depends on whether the page is currently mapped.
> So, it could so happen that there is a use after free / use outside of bounds happening in the code  and  it turns out that this location gets mapped in a different (unmapped) page when IO threads is not loaded.
>
> Could you please share the valgrind logs as well.
>
> On Wed, Jun 7, 2017 at 8:22 PM, Kinglong Mee <kinglongmee@xxxxxxxxx <mailto:kinglongmee@xxxxxxxxx>> wrote:
>
>     After deleting io-threads from the vols, quota operates (list/set/modify) lets glusterfsd crash.
>     I use it at CentOS 7 (CentOS Linux release 7.3.1611) with glusterfs 3.8.12.
>     It seems the stack corrupt, when testing with the following diff, glusterfsd runs correctly.
>
>     There are two questions as,
>     1. When using valgrind, it shows there are many "Invalid read/write" when with io-threads.
>        Why glusterfsd runs correctly with io-threads? but crash without io-threads?
>
>     2. With the following diff, valgrind also shows many "Invalid read/write" when without io-threads?
>        but no any crash.
>
>     Any comments are welcome.
>
>     Revert http://review.gluster.org/11499 <http://review.gluster.org/11499> seems better than the diff.
>
>     diff --git a/xlators/features/marker/src/marker-quota.c b/xlators/features/marke
>     index 902b8e5..f3d2507 100644
>     --- a/xlators/features/marker/src/marker-quota.c
>     +++ b/xlators/features/marker/src/marker-quota.c
>     @@ -1075,7 +1075,7 @@ mq_synctask1 (xlator_t *this, synctask_fn_t task, gf_boole
>              }
>
>              if (spawn) {
>     -                ret = synctask_new1 (this->ctx->env, 1024 * 16, task,
>     +                ret = synctask_new1 (this->ctx->env, 0, task,
>                                            mq_synctask_cleanup, NULL, args);
>                      if (ret) {
>                              gf_log (this->name, GF_LOG_ERROR, "Failed to spawn "
>
>     -----------------------------------test steps ----------------------------------
>     1. gluster volume create gvtest node1:/test/ node2:/test/
>     2. gluster volume start gvtest
>     3. gluster volume quota enable gvtest
>
>     4. "deletes io-threads from all vols"
>     5. reboot node1 and node2.
>     6. sh quota-set.sh
>
>     # cat quota-set.sh
>     gluster volume quota gvtest list
>     gluster volume quota gvtest limit-usage / 10GB
>     gluster volume quota gvtest limit-usage /1234 1GB
>     gluster volume quota gvtest limit-usage /hello 1GB
>     gluster volume quota gvtest limit-usage /test 1GB
>     gluster volume quota gvtest limit-usage /xyz 1GB
>     gluster volume quota gvtest list
>     gluster volume quota gvtest remove /hello
>     gluster volume quota gvtest remove /test
>     gluster volume quota gvtest list
>     gluster volume quota gvtest limit-usage /test 1GB
>     gluster volume quota gvtest remove /xyz
>     gluster volume quota gvtest list
>
>     -----------------------glusterfsd crash without the diff--------------------------------
>
>     /usr/local/lib/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xf5)[0x7f6e1e950af1]
>     /usr/local/lib/libglusterfs.so.0(gf_print_trace+0x21f)[0x7f6e1e956943]
>     /usr/local/sbin/glusterfsd(glusterfsd_print_trace+0x1f)[0x409c83]
>     /lib64/libc.so.6(+0x35250)[0x7f6e1d025250]
>     /lib64/libc.so.6(gsignal+0x37)[0x7f6e1d0251d7]
>     /lib64/libc.so.6(abort+0x148)[0x7f6e1d0268c8]
>     /lib64/libc.so.6(+0x74f07)[0x7f6e1d064f07]
>     /lib64/libc.so.6(+0x7baf5)[0x7f6e1d06baf5]
>     /lib64/libc.so.6(+0x7c3e6)[0x7f6e1d06c3e6]
>     /usr/local/lib/libglusterfs.so.0(__gf_free+0x311)[0x7f6e1e981327]
>     /usr/local/lib/libglusterfs.so.0(synctask_destroy+0x82)[0x7f6e1e995c20]
>     /usr/local/lib/libglusterfs.so.0(synctask_done+0x25)[0x7f6e1e995c47]
>     /usr/local/lib/libglusterfs.so.0(synctask_switchto+0xcf)[0x7f6e1e996585]
>     /usr/local/lib/libglusterfs.so.0(syncenv_processor+0x60)[0x7f6e1e99663d]
>     /lib64/libpthread.so.0(+0x7dc5)[0x7f6e1d7a2dc5]
>     /lib64/libc.so.6(clone+0x6d)[0x7f6e1d0e773d]
>
>     or
>
>     package-string: glusterfs 3.8.12
>     /usr/local/lib/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xf5)[0x7fa15e623af1]
>     /usr/local/lib/libglusterfs.so.0(gf_print_trace+0x21f)[0x7fa15e629943]
>     /usr/local/sbin/glusterfsd(glusterfsd_print_trace+0x1f)[0x409c83]
>     /lib64/libc.so.6(+0x35250)[0x7fa15ccf8250]
>     /lib64/libc.so.6(gsignal+0x37)[0x7fa15ccf81d7]
>     /lib64/libc.so.6(abort+0x148)[0x7fa15ccf98c8]
>     /lib64/libc.so.6(+0x74f07)[0x7fa15cd37f07]
>     /lib64/libc.so.6(+0x7dd4d)[0x7fa15cd40d4d]
>     /lib64/libc.so.6(__libc_calloc+0xb4)[0x7fa15cd43a14]
>     /usr/local/lib/libglusterfs.so.0(__gf_calloc+0xa7)[0x7fa15e653a5f]
>     /usr/local/lib/libglusterfs.so.0(iobref_new+0x2b)[0x7fa15e65875a]
>     /usr/local/lib/glusterfs/3.8.12/rpc-transport/socket.so(+0xa98c)[0x7fa153a8398c]
>     /usr/local/lib/glusterfs/3.8.12/rpc-transport/socket.so(+0xacbc)[0x7fa153a83cbc]
>     /usr/local/lib/glusterfs/3.8.12/rpc-transport/socket.so(+0xad10)[0x7fa153a83d10]
>     /usr/local/lib/glusterfs/3.8.12/rpc-transport/socket.so(+0xb2a7)[0x7fa153a842a7]
>     /usr/local/lib/libglusterfs.so.0(+0x97ea9)[0x7fa15e68eea9]
>     /usr/local/lib/libglusterfs.so.0(+0x982c6)[0x7fa15e68f2c6]
>     /lib64/libpthread.so.0(+0x7dc5)[0x7fa15d475dc5]
>     /lib64/libc.so.6(clone+0x6d)[0x7fa15cdba73d]
>
>     _______________________________________________
>     Gluster-devel mailing list
>     Gluster-devel@xxxxxxxxxxx <mailto:Gluster-devel@gluster.org>
>     http://lists.gluster.org/mailman/listinfo/gluster-devel <http://lists.gluster.org/mailman/listinfo/gluster-devel>
>
>

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-devel

[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux