repeatable crash in librbd1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

we observe crashes in librbd1 on specific workloads in virtual machines
on Ubuntu 20.04 hosts with librbd1=15.2.4-1focal.

The changes in
https://github.com/ceph/ceph/commit/50694f790245ca90a3b8a644da7b128a7a148cc6
could be related, but do not easily apply against v15.2.4.

We have collected several backtraces, using a reliable local
reproducer.

(This is a resent of the message stuck in the moderation queue since yesterday.)

Best regards
Johannes


VM in libvirt with:
<pre>
    <disk type='network' device='disk'>
      <driver name='qemu' type='raw' discard='unmap'/>
      <source protocol='rbd' name='pool/disk' index='4'>
        <!-- omitted -->
      </source>
      <iotune>
        <read_bytes_sec>209715200</read_bytes_sec>
        <write_bytes_sec>209715200</write_bytes_sec>
        <read_iops_sec>5000</read_iops_sec>
        <write_iops_sec>5000</write_iops_sec>
        <read_bytes_sec_max>314572800</read_bytes_sec_max>
        <write_bytes_sec_max>314572800</write_bytes_sec_max>
        <read_iops_sec_max>7500</read_iops_sec_max>
        <write_iops_sec_max>7500</write_iops_sec_max>
        <read_bytes_sec_max_length>60</read_bytes_sec_max_length>
        <write_bytes_sec_max_length>60</write_bytes_sec_max_length>
        <read_iops_sec_max_length>60</read_iops_sec_max_length>
        <write_iops_sec_max_length>60</write_iops_sec_max_length>
      </iotune>
    </disk>
</pre>

workload:
<pre>
fio --rw=write --name=test --size=10M
timeout 30s fio --rw=write --name=test --size=20G
timeout 3m fio --rw=write --name=test --size=20G --direct=1
timeout 1m fio --rw=randrw --name=test --size=20G --direct=1
timeout 10s fio --numjobs=8 --rw=randrw --name=test --size=1G --direct=1
# the backtraces are then observed while the following command is running
fio --ioengine=libaio --iodepth=16 --numjobs=8 --rw=randrw --name=test --size=1G --direct=1
</pre>


observed stack traces

three times:
<pre>
#0  librbd::io::AioCompletion::complete_event_socket (this=this@entry=0x557f633e9400) at ./src/common/event_socket.h:32
#1  0x00007ffb9740ba34 in librbd::io::AioCompletion::complete_external_callback (this=this@entry=0x557f633f7600) at ./src/librbd/io/AioCompletion.cc:262
#2  0x00007ffb9740ce98 in librbd::io::AioCompletion::complete (this=0x557f633f7600) at ./src/librbd/io/AioCompletion.cc:104
#3  0x00007ffb9740d1a0 in librbd::io::AioCompletion::complete_request (this=0x557f633f7600, r=r@entry=4096) at ./src/librbd/io/AioCompletion.cc:229
#4  0x00007ffb9742fdca in librbd::io::ReadResult::C_ObjectReadRequest::finish (this=0x7ffb68364a60, r=4096) at ./src/librbd/io/ReadResult.cc:155
#5  0x00007ffb9728334d in Context::complete (this=0x7ffb68364a60, r=<optimized out>) at ./src/include/Context.h:77
#6  0x00007ffb9742c7d9 in librbd::io::ObjectDispatchSpec::C_Dispatcher::finish (this=0x7ffb6832bed0, r=<optimized out>) at ./src/librbd/io/ObjectDispatchSpec.cc:32
#7  0x00007ffb9742c735 in librbd::io::ObjectDispatchSpec::C_Dispatcher::complete (this=<optimized out>, r=<optimized out>) at ./src/librbd/io/ObjectDispatchSpec.cc:23
#8  0x00007ffb9755e942 in librbd::io::ObjectRequest<librbd::ImageCtx>::finish (this=this@entry=0x7ffb683394a0, r=r@entry=0) at ./src/include/Context.h:78
#9  0x00007ffb97562e3b in librbd::io::ObjectReadRequest<librbd::ImageCtx>::handle_read_object (this=0x7ffb683394a0, r=0) at ./src/log/SubsystemMap.h:72
#10 0x00007ffb970f4177 in librados::C_AioComplete::finish (this=0x7ffb68362e30, r=<optimized out>) at ./src/librados/AioCompletionImpl.h:140
#11 0x00007ffb970afd1d in Context::complete (this=0x7ffb68362e30, r=<optimized out>) at ./src/include/Context.h:77
#12 0x00007ffb8765791d in Finisher::finisher_thread_entry (this=0x557f63387a30) at ./src/common/Finisher.cc:66
#13 0x00007ffb9946f609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#14 0x00007ffb99396103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
</pre>

once:
<pre>
#0  0x00007fe7a972169d in std::atomic<boost::lockfree::detail::tagged_ptr<boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::node> >::load (__m=std::memory_order_acquire, this=<optimized out>) at /usr/include/c++/9/atomic:250
#1  boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::do_push<false> (t=<optimized out>, this=<optimized out>) at ./obj-x86_64-linux-gnu/boost/include/boost/lockfree/queue.hpp:311
#2  boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::push (t=<optimized out>, this=<optimized out>) at ./obj-x86_64-linux-gnu/boost/include/boost/lockfree/queue.hpp:280
#3  librbd::io::AioCompletion::complete_event_socket (this=this@entry=0x5613db630440) at ./src/librbd/io/AioCompletion.cc:276
#4  0x00007fe7a9721a34 in librbd::io::AioCompletion::complete_external_callback (this=this@entry=0x5613dbbfa2c0) at ./src/librbd/io/AioCompletion.cc:262
#5  0x00007fe7a9722e98 in librbd::io::AioCompletion::complete (this=0x5613dbbfa2c0) at ./src/librbd/io/AioCompletion.cc:104
#6  0x00007fe7a97231a0 in librbd::io::AioCompletion::complete_request (this=0x5613dbbfa2c0, r=r@entry=4096) at ./src/librbd/io/AioCompletion.cc:229
#7  0x00007fe7a9745dca in librbd::io::ReadResult::C_ObjectReadRequest::finish (this=0x5613dc6a5c90, r=4096) at ./src/librbd/io/ReadResult.cc:155
#8  0x00007fe7a959934d in Context::complete (this=0x5613dc6a5c90, r=<optimized out>) at ./src/include/Context.h:77
#9  0x00007fe7a97427d9 in librbd::io::ObjectDispatchSpec::C_Dispatcher::finish (this=0x5613db8bae40, r=<optimized out>) at ./src/librbd/io/ObjectDispatchSpec.cc:32
#10 0x00007fe7a9742735 in librbd::io::ObjectDispatchSpec::C_Dispatcher::complete (this=<optimized out>, r=<optimized out>) at ./src/librbd/io/ObjectDispatchSpec.cc:23
#11 0x00007fe7a9874942 in librbd::io::ObjectRequest<librbd::ImageCtx>::finish (this=this@entry=0x5613dbbfc900, r=r@entry=0) at ./src/include/Context.h:78
#12 0x00007fe7a9878e3b in librbd::io::ObjectReadRequest<librbd::ImageCtx>::handle_read_object (this=0x5613dbbfc900, r=0) at ./src/log/SubsystemMap.h:72
#13 0x00007fe7a940a177 in librados::C_AioComplete::finish (this=0x5613db85e6d0, r=<optimized out>) at ./src/librados/AioCompletionImpl.h:140
#14 0x00007fe7a93c5d1d in Context::complete (this=0x5613db85e6d0, r=<optimized out>) at ./src/include/Context.h:77
#15 0x00007fe79b65791d in Finisher::finisher_thread_entry (this=0x5613db369ee0) at ./src/common/Finisher.cc:66
#16 0x00007fe7ab788609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#17 0x00007fe7ab6af103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
</pre>

twice:
<pre>
#0  boost::lockfree::detail::tagged_ptr<boost::lockfree::detail::freelist_stack<boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::node, std::allocator<boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::node> >::freelist_node>::extract_ptr (i=<error reading variable>) at ./obj-x86_64-linux-gnu/boost/include/boost/lockfree/detail/tagged_ptr_ptrcompression.hpp:113
#1  boost::lockfree::detail::tagged_ptr<boost::lockfree::detail::freelist_stack<boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::node, std::allocator<boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::node> >::freelist_node>::get_ptr (this=0xfffff40ee9c0) at ./obj-x86_64-linux-gnu/boost/include/boost/lockfree/detail/tagged_ptr_ptrcompression.hpp:115
#2  boost::lockfree::detail::freelist_stack<boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::node, std::allocator<boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::node> >::allocate_impl<false> (this=<optimized out>) at ./obj-x86_64-linux-gnu/boost/include/boost/lockfree/detail/freelist.hpp:187
#3  boost::lockfree::detail::freelist_stack<boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::node, std::allocator<boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::node> >::allocate<true, false> (this=<optimized out>) at ./obj-x86_64-linux-gnu/boost/include/boost/lockfree/detail/freelist.hpp:168
#4  boost::lockfree::detail::freelist_stack<boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::node, std::allocator<boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::node> >::construct<true, false, librbd::io::AioCompletion*, boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::node*> (arg2=<optimized out>, arg1=<optimized out>, this=<optimized out>) at ./obj-x86_64-linux-gnu/boost/include/boost/lockfree/detail/freelist.hpp:100
#5  boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::do_push<false> (t=<optimized out>, this=<optimized out>) at ./obj-x86_64-linux-gnu/boost/include/boost/lockfree/queue.hpp:302
#6  boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::push (t=<optimized out>, this=<optimized out>) at ./obj-x86_64-linux-gnu/boost/include/boost/lockfree/queue.hpp:280
#7  librbd::io::AioCompletion::complete_event_socket (this=this@entry=0x560f5e10c000) at ./src/librbd/io/AioCompletion.cc:276
#8  0x00007f8c8bcada34 in librbd::io::AioCompletion::complete_external_callback (this=this@entry=0x560f5edd5d30) at ./src/librbd/io/AioCompletion.cc:262
#9  0x00007f8c8bcaee98 in librbd::io::AioCompletion::complete (this=0x560f5edd5d30) at ./src/librbd/io/AioCompletion.cc:104
#10 0x00007f8c8bcaf1a0 in librbd::io::AioCompletion::complete_request (this=0x560f5edd5d30, r=r@entry=4096) at ./src/librbd/io/AioCompletion.cc:229
#11 0x00007f8c8bcd1dca in librbd::io::ReadResult::C_ObjectReadRequest::finish (this=0x7f8bc4342280, r=4096) at ./src/librbd/io/ReadResult.cc:155
#12 0x00007f8c8bb2534d in Context::complete (this=0x7f8bc4342280, r=<optimized out>) at ./src/include/Context.h:77
#13 0x00007f8c8bcce7d9 in librbd::io::ObjectDispatchSpec::C_Dispatcher::finish (this=0x7f8bc4326820, r=<optimized out>) at ./src/librbd/io/ObjectDispatchSpec.cc:32
#14 0x00007f8c8bcce735 in librbd::io::ObjectDispatchSpec::C_Dispatcher::complete (this=<optimized out>, r=<optimized out>) at ./src/librbd/io/ObjectDispatchSpec.cc:23
#15 0x00007f8c8be00942 in librbd::io::ObjectRequest<librbd::ImageCtx>::finish (this=this@entry=0x7f8bc42b2a30, r=r@entry=0) at ./src/include/Context.h:78
#16 0x00007f8c8be04e3b in librbd::io::ObjectReadRequest<librbd::ImageCtx>::handle_read_object (this=0x7f8bc42b2a30, r=0) at ./src/log/SubsystemMap.h:72
#17 0x00007f8c8b996177 in librados::C_AioComplete::finish (this=0x7f8bc43423b0, r=<optimized out>) at ./src/librados/AioCompletionImpl.h:140
#18 0x00007f8c8b951d1d in Context::complete (this=0x7f8bc43423b0, r=<optimized out>) at ./src/include/Context.h:77
#19 0x00007f8c82f4291d in Finisher::finisher_thread_entry (this=0x560f5f44b180) at ./src/common/Finisher.cc:66
#20 0x00007f8c91e1a609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#21 0x00007f8c91d41103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
</pre>

once:
<pre>
#0  boost::lockfree::detail::tagged_ptr<boost::lockfree::detail::freelist_stack<boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::node, std::allocator<boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::node> >::freelist_node>::extract_ptr (i=<error reading variable>) at ./obj-x86_64-linux-gnu/boost/include/boost/lockfree/detail/tagged_ptr_ptrcompression.hpp:113
#1  boost::lockfree::detail::tagged_ptr<boost::lockfree::detail::freelist_stack<boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::node, std::allocator<boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::node> >::freelist_node>::get_ptr (this=0xfffff40ee9c0) at ./obj-x86_64-linux-gnu/boost/include/boost/lockfree/detail/tagged_ptr_ptrcompression.hpp:115
#2  boost::lockfree::detail::freelist_stack<boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::node, std::allocator<boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::node> >::allocate_impl<false> (this=<optimized out>) at ./obj-x86_64-linux-gnu/boost/include/boost/lockfree/detail/freelist.hpp:187
#3  boost::lockfree::detail::freelist_stack<boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::node, std::allocator<boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::node> >::allocate<true, false> (this=<optimized out>) at ./obj-x86_64-linux-gnu/boost/include/boost/lockfree/detail/freelist.hpp:168
#4  boost::lockfree::detail::freelist_stack<boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::node, std::allocator<boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::node> >::construct<true, false, librbd::io::AioCompletion*, boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::node*> (arg2=<optimized out>, arg1=<optimized out>, this=<optimized out>) at ./obj-x86_64-linux-gnu/boost/include/boost/lockfree/detail/freelist.hpp:100
#5  boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::do_push<false> (t=<optimized out>, this=<optimized out>) at ./obj-x86_64-linux-gnu/boost/include/boost/lockfree/queue.hpp:302
#6  boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::push (t=<optimized out>, this=<optimized out>) at ./obj-x86_64-linux-gnu/boost/include/boost/lockfree/queue.hpp:280
#7  librbd::io::AioCompletion::complete_event_socket (this=this@entry=0x55d1b89113c0) at ./src/librbd/io/AioCompletion.cc:276
#8  0x00007f275f6d7a34 in librbd::io::AioCompletion::complete_external_callback (this=this@entry=0x55d1b8a7a0d0) at ./src/librbd/io/AioCompletion.cc:262
#9  0x00007f275f6d8e98 in librbd::io::AioCompletion::complete (this=0x55d1b8a7a0d0) at ./src/librbd/io/AioCompletion.cc:104
#10 0x00007f275f6d91a0 in librbd::io::AioCompletion::complete_request (this=0x55d1b8a7a0d0, r=<optimized out>) at ./src/librbd/io/AioCompletion.cc:229
#11 0x00007f275f54f34d in Context::complete (this=0x7f268833f850, r=<optimized out>) at ./src/include/Context.h:77
#12 0x00007f274f6b647b in ThreadPool::worker (this=0x55d1b9d04860, wt=<optimized out>) at ./src/common/WorkQueue.cc:118
#13 0x00007f274f6b7545 in ThreadPool::WorkThread::entry (this=<optimized out>) at ./src/common/WorkQueue.h:466
#14 0x00007f276173e609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#15 0x00007f2761665103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
</pre>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux