[Gluster-devel]glusterfs crashed lead by liblvm2app.so with BD xlator

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Hi all,

We are testing BD xlator to verify the KVM running with gluster. After some 
simple tests, we encountered a coredump of glusterfs lead by liblvm2app.so. 
Hope some one here might give some advises about this issue. 

We have debug for some time, and found out this coredump is triggered by a 
thread-safe issue. From the core file, the top function is _update_mda()
with a invailid pointer which is from lvmcache_foreach_mda(). As we know, the glusterfsd
has some io threads to simulate the async io. That will make more than 1 thread run into
bd_statfs_cbk(). And in liblvm2app.so, _text_read() will look up an info in a hash 
table named _pvid_hash. If no info item exist, it will allocate a new one. However, 
there isn't any lock to protect this operations! liblvm2app.so will get crashed with 
multi-thread like this precedures:

Thread A and thread B go into bd_statfs_cbk() at the same time:
1. A allocate an new info node, and put it into _pvid_hash, call lvmcache_foreach_mda().
2. B looks up and get the info generaed by A in _pvid_hash, pass it to lvmcache_del_mdas(), this will free the info node.
3. A keep using the info node which has been freed by B.
4. Memory crash...

Reproduce steps:
1. Create a BD volume with BD xlator follow a standard method. Mount it on a glusterfs client.

2. Write a simple test script crash_bd.sh:
#!/bin/bash
while :;do
    i=0 
    while [ $i -lt 10 ]; do
        df > /dev/null
        i=`expr $i + 1`
    done
    sleep 10;
done

3. Start some crash_bd.sh at the same time
# ./crash_bd.sh &
# ./crash_bd.sh &
# ./crash_bd.sh &
# ./crash_bd.sh &

4. Just wait some minutes, it will get an error like this:
df: `/mnt/bd_vol': Transport endpoint is not connected
the glusterfs has crashed.

Note: If we set the io-thread number to single, the BD xlator looks running very well!

Hope some one here might give some advises about this issue. Any infomations is appriciated!

Core detail:
Core was generated by `/usr/sbin/glusterfsd -s host-005056b50a23 --volfile-id bd.host-0050'.
Program terminated with signal 11, Segmentation fault.
#0  _update_mda (mda=0x11fb000, baton=0x7f83b1d0a6d0) at format_text/text_label.c:328
353    format_text/text_label.c: No such file or directory.
(gdb) bt
#0  _update_mda (mda=0x11fb000, baton=0x7f83b1d0a6d0) at format_text/text_label.c:328
#1  0x00007f83b59a0e09 in lvmcache_foreach_mda (info=info@entry=0x11faf00, fun=fun@entry=0x7f83b59c1a60 <_update_mda>, 
    baton=baton@entry=0x7f83b1d0a6d0) at cache/lvmcache.c:1880
#2  0x00007f83b59c0e5f in _text_read (l=<optimized out>, dev=0x11ee8d8, buf=<optimized out>, label=0x7f83b1d0a958)
    at format_text/text_label.c:459
#3  0x00007f83b59c27e7 in label_read (dev=0x11ee8d8, result=result@entry=0x7f83b1d0a958, scan_sector=scan_sector@entry=0)
    at label/label.c:284
#4  0x00007f83b599dd2b in lvmcache_fmt_from_vgname (cmd=cmd@entry=0x11d3c40, vgname=vgname@entry=0x11d2c50 "bd-vg", 
    vgid=vgid@entry=0x0, revalidate_labels=revalidate_labels@entry=1) at cache/lvmcache.c:506
#5  0x00007f83b59e1ad8 in _vg_read (cmd=cmd@entry=0x11d3c40, vgname=vgname@entry=0x11d2c50 "bd-vg", vgid=vgid@entry=0x0, 
    warnings=warnings@entry=1, consistent=consistent@entry=0x7f83b1d0ab48, precommitted=precommitted@entry=0)
    at metadata/metadata.c:3143
#6  0x00007f83b59e2ecc in vg_read_internal (cmd=cmd@entry=0x11d3c40, vgname=vgname@entry=0x11d2c50 "bd-vg", 
    vgid=vgid@entry=0x0, warnings=warnings@entry=1, consistent=consistent@entry=0x7f83b1d0ab48) at metadata/metadata.c:3549
#7  0x00007f83b59e30cc in _vg_lock_and_read (misc_flags=0, status_flags=0, lock_flags=33, vgid=0x0, 
    vg_name=0x11d2c50 "bd-vg", cmd=0x11d3c40) at metadata/metadata.c:4235
#8  vg_read (cmd=cmd@entry=0x11d3c40, vg_name=vg_name@entry=0x11d2c50 "bd-vg", vgid=vgid@entry=0x0, flags=0)
    at metadata/metadata.c:4343
#9  0x00007f83b599753f in _lvm_vg_open (mode=0x7f83b5c8971e "r", vgname=0x11d2c50 "bd-vg", libh=0x11d3c40, 
    flags=<optimized out>) at lvm_vg.c:221
#10 lvm_vg_open (libh=0x11d3c40, vgname=0x11d2c50 "bd-vg", mode=mode@entry=0x7f83b5c8971e "r", flags=flags@entry=0)
    at lvm_vg.c:238
#11 0x00007f83b5c7ee36 in bd_statfs_cbk (frame=0x7f83b95416e4, cookie=<optimized out>, this=0x119eb90, op_ret=0, op_errno=0, 
    buff=0x7f83b1d0ac70, xdata=0x0) at bd.c:353
......

(gdb) f 1
#1  0x00007f83b59a0e09 in lvmcache_foreach_mda (info=info@entry=0x11faf00, fun=fun@entry=0x7f83b59c1a60 <_update_mda>, 
    baton=baton@entry=0x7f83b1d0a6d0) at cache/lvmcache.c:1899
1899    cache/lvmcache.c: No such file or directory.
(gdb) p *info
$1 = {list = {n = 0x11fa650, p = 0x11fa650}, mdas = {n =0x11fafd0, p = 0x11fafd0}, das = {n = 0x11fb000, p = 0x11fb000}, 
  bas = {n = 0x11faf30, p = 0x11faf30}, vginfo = 0x11fa640, label = 0x11faed0, fmt = 0x11f8480, dev = 0x11ee8d8, 
  device_size = 531870253056, status = 1}
(gdb) info threads 
  Id   Target Id         Frame 
  11   Thread 0x7f83b3786700 (LWP 24272) 0x00007f7bc917cdec in _dev_close (dev=0x16421d0, immediate=immediate@entry=0) at device/dev-io.c:624
  10   Thread 0x7f83bb968700 (LWP 23306) 0x00007f83ba5e40d3 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6
  9    Thread 0x7f83b250c700 (LWP 24276) 0x00007f83bac822d4 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib/x86_64-linux-gnu/libpthread.so.0
  8    Thread 0x7f83b2d0d700 (LWP 24275) 0x00007f83bac8264b in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib/x86_64-linux-gnu/libpthread.so.0
  7    Thread 0x7f83b350e700 (LWP 24274) 0x00007f83ba5b4bdd in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6
  6    Thread 0x7f83b3887700 (LWP 24271) 0x00007f83bac822d4 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib/x86_64-linux-gnu/libpthread.so.0
  5    Thread 0x7f83b6ed7700 (LWP 23310) 0x00007f83bac858ad in nanosleep () from /lib/x86_64-linux-gnu/libpthread.so.0
  4    Thread 0x7f83b7d57700 (LWP 23309) 0x00007f83bac8264b in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib/x86_64-linux-gnu/libpthread.so.0
  3    Thread 0x7f83b8558700 (LWP 23308) 0x00007f83bac8264b in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib/x86_64-linux-gnu/libpthread.so.0
  2    Thread 0x7f83b8d59700 (LWP 23307) 0x00007f83bac85d77 in do_sigwait () from /lib/x86_64-linux-gnu/libpthread.so.0
* 1    Thread 0x7f83b1d0b700 (LWP 26000) _update_mda (mda=0x11fb000, baton=0x7f83b1d0a6d0) at format_text/text_label.c:353
(gdb)thread 11
[Switching to thread 11 (Thread 0x7f83b3786700 (LWP 24272))]
#0  0x00007f83bac8578d in fsync () from /lib/x86_64-linux-gnu/libpthread.so.0
(gdb) bt
#0  0x00007f7bc917cdec in _dev_close (dev=0x16421d0, immediate=immediate@entry=0) at device/dev-io.c:624
#1  0x00007f7bc917d257 in dev_close (dev=<optimized out>) at device/dev-io.c:631
#2  0x00007f83b59c1a88 in _update_mda (mda=0x11fafd0, baton=0x7f83b37856d0) at at format_text/text_label.c:361
#3  0x00007f83b59a0e09 in lvmcache_foreach_mda (info=info@entry=0x11faf00, fun=fun@entry=0x7f83b59c1a60 <_update_mda>, 
    baton=baton@entry=0x7f83b37856d0) at cache/lvmcache.c:1899
#4  0x00007f83b59c0e5f in _text_read (l=<optimized out>, dev=0x11ee8d8, buf=<optimized out>, label=0x7f83b3785958)
    at format_text/text_label.c:459
#5  0x00007f83b59c27e7 in label_read (dev=0x11ee8d8, result=result@entry=0x7f83b3785958, scan_sector=scan_sector@entry=0)
    at label/label.c:284
#6  0x00007f83b599dd2b in lvmcache_fmt_from_vgname (cmd=cmd@entry=0x11d3c40, vgname=vgname@entry=0x11d2c50 "bd-vg", 
    vgid=vgid@entry=0x0, revalidate_labels=revalidate_labels@entry=1) at cache/lvmcache.c:506
#7  0x00007f83b59e1ad8 in _vg_read (cmd=cmd@entry=0x11d3c40, vgname=vgname@entry=0x11d2c50 "bd-vg", vgid=vgid@entry=0x0, 
    warnings=warnings@entry=1, consistent=consistent@entry=0x7f83b3785b48, precommitted=precommitted@entry=0)
    at metadata/metadata.c:3143
#8  0x00007f83b59e2ecc in vg_read_internal (cmd=cmd@entry=0x11d3c40, vgname=vgname@entry=0x11d2c50 "bd-vg", 
    vgid=vgid@entry=0x0, warnings=warnings@entry=1, consistent=consistent@entry=0x7f83b3785b48) at metadata/metadata.c:3549
#9  0x00007f83b59e30cc in _vg_lock_and_read (misc_flags=0, status_flags=0, lock_flags=33, vgid=0x0, 
    vg_name=0x11d2c50 "bd-vg", cmd=0x11d3c40) at metadata/metadata.c:4235
#10 vg_read (cmd=cmd@entry=0x11d3c40, vg_name=vg_name@entry=0x11d2c50 "bd-vg", vgid=vgid@entry=0x0, flags=0)
    at metadata/metadata.c:4343
#11 0x00007f83b599753f in _lvm_vg_open (mode=0x7f83b5c8971e "r", vgname=0x11d2c50 "bd-vg", libh=0x11d3c40, 
    flags=<optimized out>) at lvm_vg.c:221
#12 lvm_vg_open (libh=0x11d3c40, vgname=0x11d2c50 "bd-vg", mode=mode@entry=0x7f83b5c8971e "r", flags=flags@entry=0)
    at lvm_vg.c:238
#13 0x00007f83b5c7ee36 in bd_statfs_cbk (frame=0x7f83b95412dc, cookie=<optimized out>, this=0x119eb90, op_ret=0, op_errno=0, 
    buff=0x7f83b3785c70, xdata=0x0) at bd.c:353
......
(gdb) f 3
#3  0x00007f83b59a0e09 in lvmcache_foreach_mda (info=info@entry=0x11faf00, fun=fun@entry=0x7f83b59c1a60 <_update_mda>, 
    baton=baton@entry=0x7f83b37856d0) at cache/lvmcache.c:1899
1899    in cache/lvmcache.c
(gdb) p *info
$2 = {list = {n = 0x11fa650, p = 0x11fa650}, mdas = {n = 0x11fafd0, p = 0x11fafd0}, das = {n = 0x11fb000, p = 0x11fb000}, 
  bas = {n = 0x11faf30, p = 0x11faf30}, vginfo = 0x11fa640, label = 0x11faed0, fmt = 0x11f8480, dev = 0x11ee8d8, 
  device_size = 531870253056, status = 1}


_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux