Hi, we observe crashes in librbd1 on specific workloads in virtual machines on Ubuntu 20.04 hosts with librbd1=15.2.4-1focal. The changes in https://github.com/ceph/ceph/commit/50694f790245ca90a3b8a644da7b128a7a148cc6 could be related, but do not easily apply against v15.2.4. We have collected several backtraces, using a reliable local reproducer. (This is a resent of the message stuck in the moderation queue since yesterday.) Best regards Johannes VM in libvirt with: <pre> <disk type='network' device='disk'> <driver name='qemu' type='raw' discard='unmap'/> <source protocol='rbd' name='pool/disk' index='4'> <!-- omitted --> </source> <iotune> <read_bytes_sec>209715200</read_bytes_sec> <write_bytes_sec>209715200</write_bytes_sec> <read_iops_sec>5000</read_iops_sec> <write_iops_sec>5000</write_iops_sec> <read_bytes_sec_max>314572800</read_bytes_sec_max> <write_bytes_sec_max>314572800</write_bytes_sec_max> <read_iops_sec_max>7500</read_iops_sec_max> <write_iops_sec_max>7500</write_iops_sec_max> <read_bytes_sec_max_length>60</read_bytes_sec_max_length> <write_bytes_sec_max_length>60</write_bytes_sec_max_length> <read_iops_sec_max_length>60</read_iops_sec_max_length> <write_iops_sec_max_length>60</write_iops_sec_max_length> </iotune> </disk> </pre> workload: <pre> fio --rw=write --name=test --size=10M timeout 30s fio --rw=write --name=test --size=20G timeout 3m fio --rw=write --name=test --size=20G --direct=1 timeout 1m fio --rw=randrw --name=test --size=20G --direct=1 timeout 10s fio --numjobs=8 --rw=randrw --name=test --size=1G --direct=1 # the backtraces are then observed while the following command is running fio --ioengine=libaio --iodepth=16 --numjobs=8 --rw=randrw --name=test --size=1G --direct=1 </pre> observed stack traces three times: <pre> #0 librbd::io::AioCompletion::complete_event_socket (this=this@entry=0x557f633e9400) at ./src/common/event_socket.h:32 #1 0x00007ffb9740ba34 in librbd::io::AioCompletion::complete_external_callback (this=this@entry=0x557f633f7600) at ./src/librbd/io/AioCompletion.cc:262 #2 0x00007ffb9740ce98 in librbd::io::AioCompletion::complete (this=0x557f633f7600) at ./src/librbd/io/AioCompletion.cc:104 #3 0x00007ffb9740d1a0 in librbd::io::AioCompletion::complete_request (this=0x557f633f7600, r=r@entry=4096) at ./src/librbd/io/AioCompletion.cc:229 #4 0x00007ffb9742fdca in librbd::io::ReadResult::C_ObjectReadRequest::finish (this=0x7ffb68364a60, r=4096) at ./src/librbd/io/ReadResult.cc:155 #5 0x00007ffb9728334d in Context::complete (this=0x7ffb68364a60, r=<optimized out>) at ./src/include/Context.h:77 #6 0x00007ffb9742c7d9 in librbd::io::ObjectDispatchSpec::C_Dispatcher::finish (this=0x7ffb6832bed0, r=<optimized out>) at ./src/librbd/io/ObjectDispatchSpec.cc:32 #7 0x00007ffb9742c735 in librbd::io::ObjectDispatchSpec::C_Dispatcher::complete (this=<optimized out>, r=<optimized out>) at ./src/librbd/io/ObjectDispatchSpec.cc:23 #8 0x00007ffb9755e942 in librbd::io::ObjectRequest<librbd::ImageCtx>::finish (this=this@entry=0x7ffb683394a0, r=r@entry=0) at ./src/include/Context.h:78 #9 0x00007ffb97562e3b in librbd::io::ObjectReadRequest<librbd::ImageCtx>::handle_read_object (this=0x7ffb683394a0, r=0) at ./src/log/SubsystemMap.h:72 #10 0x00007ffb970f4177 in librados::C_AioComplete::finish (this=0x7ffb68362e30, r=<optimized out>) at ./src/librados/AioCompletionImpl.h:140 #11 0x00007ffb970afd1d in Context::complete (this=0x7ffb68362e30, r=<optimized out>) at ./src/include/Context.h:77 #12 0x00007ffb8765791d in Finisher::finisher_thread_entry (this=0x557f63387a30) at ./src/common/Finisher.cc:66 #13 0x00007ffb9946f609 in start_thread (arg=<optimized out>) at pthread_create.c:477 #14 0x00007ffb99396103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 </pre> once: <pre> #0 0x00007fe7a972169d in std::atomic<boost::lockfree::detail::tagged_ptr<boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::node> >::load (__m=std::memory_order_acquire, this=<optimized out>) at /usr/include/c++/9/atomic:250 #1 boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::do_push<false> (t=<optimized out>, this=<optimized out>) at ./obj-x86_64-linux-gnu/boost/include/boost/lockfree/queue.hpp:311 #2 boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::push (t=<optimized out>, this=<optimized out>) at ./obj-x86_64-linux-gnu/boost/include/boost/lockfree/queue.hpp:280 #3 librbd::io::AioCompletion::complete_event_socket (this=this@entry=0x5613db630440) at ./src/librbd/io/AioCompletion.cc:276 #4 0x00007fe7a9721a34 in librbd::io::AioCompletion::complete_external_callback (this=this@entry=0x5613dbbfa2c0) at ./src/librbd/io/AioCompletion.cc:262 #5 0x00007fe7a9722e98 in librbd::io::AioCompletion::complete (this=0x5613dbbfa2c0) at ./src/librbd/io/AioCompletion.cc:104 #6 0x00007fe7a97231a0 in librbd::io::AioCompletion::complete_request (this=0x5613dbbfa2c0, r=r@entry=4096) at ./src/librbd/io/AioCompletion.cc:229 #7 0x00007fe7a9745dca in librbd::io::ReadResult::C_ObjectReadRequest::finish (this=0x5613dc6a5c90, r=4096) at ./src/librbd/io/ReadResult.cc:155 #8 0x00007fe7a959934d in Context::complete (this=0x5613dc6a5c90, r=<optimized out>) at ./src/include/Context.h:77 #9 0x00007fe7a97427d9 in librbd::io::ObjectDispatchSpec::C_Dispatcher::finish (this=0x5613db8bae40, r=<optimized out>) at ./src/librbd/io/ObjectDispatchSpec.cc:32 #10 0x00007fe7a9742735 in librbd::io::ObjectDispatchSpec::C_Dispatcher::complete (this=<optimized out>, r=<optimized out>) at ./src/librbd/io/ObjectDispatchSpec.cc:23 #11 0x00007fe7a9874942 in librbd::io::ObjectRequest<librbd::ImageCtx>::finish (this=this@entry=0x5613dbbfc900, r=r@entry=0) at ./src/include/Context.h:78 #12 0x00007fe7a9878e3b in librbd::io::ObjectReadRequest<librbd::ImageCtx>::handle_read_object (this=0x5613dbbfc900, r=0) at ./src/log/SubsystemMap.h:72 #13 0x00007fe7a940a177 in librados::C_AioComplete::finish (this=0x5613db85e6d0, r=<optimized out>) at ./src/librados/AioCompletionImpl.h:140 #14 0x00007fe7a93c5d1d in Context::complete (this=0x5613db85e6d0, r=<optimized out>) at ./src/include/Context.h:77 #15 0x00007fe79b65791d in Finisher::finisher_thread_entry (this=0x5613db369ee0) at ./src/common/Finisher.cc:66 #16 0x00007fe7ab788609 in start_thread (arg=<optimized out>) at pthread_create.c:477 #17 0x00007fe7ab6af103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 </pre> twice: <pre> #0 boost::lockfree::detail::tagged_ptr<boost::lockfree::detail::freelist_stack<boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::node, std::allocator<boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::node> >::freelist_node>::extract_ptr (i=<error reading variable>) at ./obj-x86_64-linux-gnu/boost/include/boost/lockfree/detail/tagged_ptr_ptrcompression.hpp:113 #1 boost::lockfree::detail::tagged_ptr<boost::lockfree::detail::freelist_stack<boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::node, std::allocator<boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::node> >::freelist_node>::get_ptr (this=0xfffff40ee9c0) at ./obj-x86_64-linux-gnu/boost/include/boost/lockfree/detail/tagged_ptr_ptrcompression.hpp:115 #2 boost::lockfree::detail::freelist_stack<boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::node, std::allocator<boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::node> >::allocate_impl<false> (this=<optimized out>) at ./obj-x86_64-linux-gnu/boost/include/boost/lockfree/detail/freelist.hpp:187 #3 boost::lockfree::detail::freelist_stack<boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::node, std::allocator<boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::node> >::allocate<true, false> (this=<optimized out>) at ./obj-x86_64-linux-gnu/boost/include/boost/lockfree/detail/freelist.hpp:168 #4 boost::lockfree::detail::freelist_stack<boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::node, std::allocator<boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::node> >::construct<true, false, librbd::io::AioCompletion*, boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::node*> (arg2=<optimized out>, arg1=<optimized out>, this=<optimized out>) at ./obj-x86_64-linux-gnu/boost/include/boost/lockfree/detail/freelist.hpp:100 #5 boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::do_push<false> (t=<optimized out>, this=<optimized out>) at ./obj-x86_64-linux-gnu/boost/include/boost/lockfree/queue.hpp:302 #6 boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::push (t=<optimized out>, this=<optimized out>) at ./obj-x86_64-linux-gnu/boost/include/boost/lockfree/queue.hpp:280 #7 librbd::io::AioCompletion::complete_event_socket (this=this@entry=0x560f5e10c000) at ./src/librbd/io/AioCompletion.cc:276 #8 0x00007f8c8bcada34 in librbd::io::AioCompletion::complete_external_callback (this=this@entry=0x560f5edd5d30) at ./src/librbd/io/AioCompletion.cc:262 #9 0x00007f8c8bcaee98 in librbd::io::AioCompletion::complete (this=0x560f5edd5d30) at ./src/librbd/io/AioCompletion.cc:104 #10 0x00007f8c8bcaf1a0 in librbd::io::AioCompletion::complete_request (this=0x560f5edd5d30, r=r@entry=4096) at ./src/librbd/io/AioCompletion.cc:229 #11 0x00007f8c8bcd1dca in librbd::io::ReadResult::C_ObjectReadRequest::finish (this=0x7f8bc4342280, r=4096) at ./src/librbd/io/ReadResult.cc:155 #12 0x00007f8c8bb2534d in Context::complete (this=0x7f8bc4342280, r=<optimized out>) at ./src/include/Context.h:77 #13 0x00007f8c8bcce7d9 in librbd::io::ObjectDispatchSpec::C_Dispatcher::finish (this=0x7f8bc4326820, r=<optimized out>) at ./src/librbd/io/ObjectDispatchSpec.cc:32 #14 0x00007f8c8bcce735 in librbd::io::ObjectDispatchSpec::C_Dispatcher::complete (this=<optimized out>, r=<optimized out>) at ./src/librbd/io/ObjectDispatchSpec.cc:23 #15 0x00007f8c8be00942 in librbd::io::ObjectRequest<librbd::ImageCtx>::finish (this=this@entry=0x7f8bc42b2a30, r=r@entry=0) at ./src/include/Context.h:78 #16 0x00007f8c8be04e3b in librbd::io::ObjectReadRequest<librbd::ImageCtx>::handle_read_object (this=0x7f8bc42b2a30, r=0) at ./src/log/SubsystemMap.h:72 #17 0x00007f8c8b996177 in librados::C_AioComplete::finish (this=0x7f8bc43423b0, r=<optimized out>) at ./src/librados/AioCompletionImpl.h:140 #18 0x00007f8c8b951d1d in Context::complete (this=0x7f8bc43423b0, r=<optimized out>) at ./src/include/Context.h:77 #19 0x00007f8c82f4291d in Finisher::finisher_thread_entry (this=0x560f5f44b180) at ./src/common/Finisher.cc:66 #20 0x00007f8c91e1a609 in start_thread (arg=<optimized out>) at pthread_create.c:477 #21 0x00007f8c91d41103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 </pre> once: <pre> #0 boost::lockfree::detail::tagged_ptr<boost::lockfree::detail::freelist_stack<boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::node, std::allocator<boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::node> >::freelist_node>::extract_ptr (i=<error reading variable>) at ./obj-x86_64-linux-gnu/boost/include/boost/lockfree/detail/tagged_ptr_ptrcompression.hpp:113 #1 boost::lockfree::detail::tagged_ptr<boost::lockfree::detail::freelist_stack<boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::node, std::allocator<boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::node> >::freelist_node>::get_ptr (this=0xfffff40ee9c0) at ./obj-x86_64-linux-gnu/boost/include/boost/lockfree/detail/tagged_ptr_ptrcompression.hpp:115 #2 boost::lockfree::detail::freelist_stack<boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::node, std::allocator<boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::node> >::allocate_impl<false> (this=<optimized out>) at ./obj-x86_64-linux-gnu/boost/include/boost/lockfree/detail/freelist.hpp:187 #3 boost::lockfree::detail::freelist_stack<boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::node, std::allocator<boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::node> >::allocate<true, false> (this=<optimized out>) at ./obj-x86_64-linux-gnu/boost/include/boost/lockfree/detail/freelist.hpp:168 #4 boost::lockfree::detail::freelist_stack<boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::node, std::allocator<boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::node> >::construct<true, false, librbd::io::AioCompletion*, boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::node*> (arg2=<optimized out>, arg1=<optimized out>, this=<optimized out>) at ./obj-x86_64-linux-gnu/boost/include/boost/lockfree/detail/freelist.hpp:100 #5 boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::do_push<false> (t=<optimized out>, this=<optimized out>) at ./obj-x86_64-linux-gnu/boost/include/boost/lockfree/queue.hpp:302 #6 boost::lockfree::queue<librbd::io::AioCompletion*, boost::lockfree::allocator<std::allocator<void> > >::push (t=<optimized out>, this=<optimized out>) at ./obj-x86_64-linux-gnu/boost/include/boost/lockfree/queue.hpp:280 #7 librbd::io::AioCompletion::complete_event_socket (this=this@entry=0x55d1b89113c0) at ./src/librbd/io/AioCompletion.cc:276 #8 0x00007f275f6d7a34 in librbd::io::AioCompletion::complete_external_callback (this=this@entry=0x55d1b8a7a0d0) at ./src/librbd/io/AioCompletion.cc:262 #9 0x00007f275f6d8e98 in librbd::io::AioCompletion::complete (this=0x55d1b8a7a0d0) at ./src/librbd/io/AioCompletion.cc:104 #10 0x00007f275f6d91a0 in librbd::io::AioCompletion::complete_request (this=0x55d1b8a7a0d0, r=<optimized out>) at ./src/librbd/io/AioCompletion.cc:229 #11 0x00007f275f54f34d in Context::complete (this=0x7f268833f850, r=<optimized out>) at ./src/include/Context.h:77 #12 0x00007f274f6b647b in ThreadPool::worker (this=0x55d1b9d04860, wt=<optimized out>) at ./src/common/WorkQueue.cc:118 #13 0x00007f274f6b7545 in ThreadPool::WorkThread::entry (this=<optimized out>) at ./src/common/WorkQueue.h:466 #14 0x00007f276173e609 in start_thread (arg=<optimized out>) at pthread_create.c:477 #15 0x00007f2761665103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 </pre> _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx