Hello, Below is full debug log of 2 minutes before crash of virtual machine. Download from below url https://storage.habari.co.tz/index.php/s/31eCwZbOoRTMpcU apport.log Wed May 13 09:35:30 2020: host pid 4440 crashed in a separate mount namespace, ignoring kernel.log May 13 09:35:30 compute5 kernel: [123071.373217] fn-radosclient[4485]: segfault at 0 ip 00007f4c8c85d7ed sp 00007f4c66ffc470 error 4 in librbd.so.1.12.0[7f4c8c65a000+5cb000] May 13 09:35:30 compute5 kernel: [123071.373228] Code: 8d 44 24 08 48 81 c3 d8 3e 00 00 49 21 f9 48 c1 e8 30 83 c0 01 48 c1 e0 30 48 89 02 48 8b 03 48 89 04 24 48 8b 34 24 48 21 fe <48> 8b 06 48 89 44 24 08 48 8b 44 24 08 48 8b 0b 48 21 f8 48 39 0c May 13 09:35:33 compute5 kernel: [123074.832700] brqa72d845b-e9: port 1(tap33511c4d-2c) entered disabled state May 13 09:35:33 compute5 kernel: [123074.838520] device tap33511c4d-2c left promiscuous mode May 13 09:35:33 compute5 kernel: [123074.838527] brqa72d845b-e9: port 1(tap33511c4d-2c) entered disabled state syslog compute5 kernel: [123071.373217] fn-radosclient[4485]: segfault at 0 ip 00007f4c8c85d7ed sp 00007f4c66ffc470 error 4 i n librbd.so.1.12.0[7f4c8c65a000+5cb000] May 13 09:35:30 compute5 kernel: [123071.373228] Code: 8d 44 24 08 48 81 c3 d8 3e 00 00 49 21 f9 48 c1 e8 30 83 c0 01 48 c1 e0 30 48 8 9 02 48 8b 03 48 89 04 24 48 8b 34 24 48 21 fe <48> 8b 06 48 89 44 24 08 48 8b 44 24 08 48 8b 0b 48 21 f8 48 39 0c May 13 09:35:30 compute5 libvirtd[1844]: internal error: End of file from qemu monitor May 13 09:35:33 compute5 systemd-networkd[1326]: tap33511c4d-2c: Link DOWN May 13 09:35:33 compute5 systemd-networkd[1326]: tap33511c4d-2c: Lost carrier May 13 09:35:33 compute5 kernel: [123074.832700] brqa72d845b-e9: port 1(tap33511c4d-2c) entered disabled state May 13 09:35:33 compute5 kernel: [123074.838520] device tap33511c4d-2c left promiscuous mode May 13 09:35:33 compute5 kernel: [123074.838527] brqa72d845b-e9: port 1(tap33511c4d-2c) entered disabled state May 13 09:35:33 compute5 networkd-dispatcher[1614]: Failed to request link: No such device On Fri, May 8, 2020 at 5:40 AM Brad Hubbard <bhubbard@xxxxxxxxxx> wrote: > On Fri, May 8, 2020 at 12:10 PM Lomayani S. Laizer <lomlaizer@xxxxxxxxx> > wrote: > > > > Hello, > > On my side at point of vm crash these are logs below. At the moment my > debug is at 10 value. I will rise to 20 for full debug. these crashes are > random and so far happens on very busy vms. Downgrading clients in host to > Nautilus these crashes disappear > > You could try adding debug_rados as well but you may get a very large > log so keep an eye on things. > > > > > Qemu is not shutting down in general because other vms on the same host > continues working > > A process can not reliably continue after encountering a segfault so > the qemu-kvm process must be ending and therefore it should be > possible to capture a coredump with the right configuration. > > In the following example, if you were to search for pid 6060 you would > find it is no longer running. > >> > [ 7682.233684] fn-radosclient[6060]: segfault at 2b19 ip > 00007f8165cc0a50 sp 00007f81397f6490 error 4 in > librbd.so.1.12.0[7f8165ab4000+537000] > > Without a backtrace at a minimum it may be very difficult to work out > what's going on with certainty. If you open a tracker for the issue > though maybe one of the devs specialising in rbd may have some > feedback. > > > > > 2020-05-07T13:02:12.121+0300 7f88d57fa700 10 librbd::io::ReadResult: > 0x7f88c80bfbf0 finish: got {} for [0,24576] bl 24576 > > 2020-05-07T13:02:12.193+0300 7f88d57fa700 10 librbd::io::ReadResult: > 0x7f88c80f9330 finish: C_ObjectReadRequest: r=0 > > 2020-05-07T13:02:12.193+0300 7f88d57fa700 10 librbd::io::ReadResult: > 0x7f88c80f9330 finish: got {} for [0,16384] bl 16384 > > 2020-05-07T13:02:28.694+0300 7f890ba90500 10 librbd::ImageState: > 0x5569b5da9bb0 0x5569b5da9bb0 send_close_unlock > > 2020-05-07T13:02:28.694+0300 7f890ba90500 10 librbd::ImageState: > 0x5569b5da9bb0 0x5569b5da9bb0 send_close_unlock > > 2020-05-07T13:02:28.694+0300 7f890ba90500 10 > librbd::image::CloseRequest: 0x7f88c8175fd0 send_block_image_watcher > > 2020-05-07T13:02:28.694+0300 7f890ba90500 10 librbd::ImageWatcher: > 0x7f88c400dfe0 block_notifies > > 2020-05-07T13:02:28.694+0300 7f890ba90500 5 librbd::Watcher: > 0x7f88c400dfe0 block_notifies: blocked_count=1 > > 2020-05-07T13:02:28.694+0300 7f890ba90500 10 > librbd::image::CloseRequest: 0x7f88c8175fd0 handle_block_image_watcher: r=0 > > 2020-05-07T13:02:28.694+0300 7f890ba90500 10 > librbd::image::CloseRequest: 0x7f88c8175fd0 send_shut_down_update_watchers > > 2020-05-07T13:02:28.694+0300 7f88d4ff9700 10 > librbd::image::CloseRequest: 0x7f88c8175fd0 > handle_shut_down_update_watchers: r=0 > > 2020-05-07T13:02:28.694+0300 7f88d4ff9700 10 > librbd::image::CloseRequest: 0x7f88c8175fd0 send_shut_down_io_queue > > 2020-05-07T13:02:28.694+0300 7f88d4ff9700 5 librbd::io::ImageRequestWQ: > 0x7f88e8001570 shut_down: shut_down: in_flight=0 > > 2020-05-07T13:02:28.694+0300 7f88d4ff9700 10 > librbd::image::CloseRequest: 0x7f88c8175fd0 handle_shut_down_io_queue: r=0 > > 2020-05-07T13:02:28.694+0300 7f88d4ff9700 10 > librbd::image::CloseRequest: 0x7f88c8175fd0 send_shut_down_exclusive_lock > > 2020-05-07T13:02:28.694+0300 7f88d4ff9700 10 librbd::ExclusiveLock: > 0x7f88c4011ba0 shut_down > > 2020-05-07T13:02:28.694+0300 7f88d4ff9700 10 librbd::ManagedLock: > 0x7f88c4011bb8 shut_down: > > 2020-05-07T13:02:28.694+0300 7f88d4ff9700 10 librbd::ManagedLock: > 0x7f88c4011bb8 send_shutdown: > > 2020-05-07T13:02:28.694+0300 7f88d4ff9700 10 librbd::ManagedLock: > 0x7f88c4011bb8 send_shutdown_release: > > 2020-05-07T13:02:28.694+0300 7f88d4ff9700 10 librbd::ExclusiveLock: > 0x7f88c4011ba0 pre_release_lock_handler > > 2020-05-07T13:02:28.694+0300 7f88d4ff9700 10 > librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020 > send_cancel_op_requests: > > 2020-05-07T13:02:28.694+0300 7f88d4ff9700 10 > librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020 > handle_cancel_op_requests: r=0 > > 2020-05-07T13:02:28.694+0300 7f88d4ff9700 10 > librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020 send_block_writes: > > 2020-05-07T13:02:28.698+0300 7f88d4ff9700 5 librbd::io::ImageRequestWQ: > 0x7f88e8001570 block_writes: 0x5569b5e1ffd0, num=1 > > 2020-05-07T13:02:28.698+0300 7f88d4ff9700 10 > librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020 > handle_block_writes: r=0 > > 2020-05-07T13:02:28.698+0300 7f88d4ff9700 10 > librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020 send_wait_for_ops: > > 2020-05-07T13:02:28.698+0300 7f88d4ff9700 10 > librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020 > handle_wait_for_ops: > > 2020-05-07T13:02:28.698+0300 7f88d4ff9700 10 > librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020 > send_invalidate_cache: > > 2020-05-07T13:02:28.698+0300 7f88d4ff9700 5 > librbd::io::ObjectDispatcher: 0x5569b5dab700 invalidate_cache: > > 2020-05-07T13:02:28.698+0300 7f88d4ff9700 10 > librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020 > handle_invalidate_cache: r=0 > > 2020-05-07T13:02:28.698+0300 7f88d4ff9700 10 > librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020 > send_flush_notifies: > > 2020-05-07T13:02:28.698+0300 7f88d4ff9700 10 > librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020 > handle_flush_notifies: > > 2020-05-07T13:02:28.698+0300 7f88d4ff9700 10 > librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020 > send_close_object_map: > > 2020-05-07T13:02:28.698+0300 7f88d4ff9700 10 > librbd::object_map::UnlockRequest: 0x7f88c807a450 send_unlock: > oid=rbd_object_map.2f18f2a67fad72 > > 2020-05-07T13:02:28.702+0300 7f88d57fa700 10 > librbd::object_map::UnlockRequest: 0x7f88c807a450 handle_unlock: r=0 > > 2020-05-07T13:02:28.702+0300 7f88d57fa700 10 > librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020 > handle_close_object_map: r=0 > > 2020-05-07T13:02:28.702+0300 7f88d57fa700 10 > librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020 send_unlock: > > 2020-05-07T13:02:28.702+0300 7f88d4ff9700 10 librbd::ManagedLock: > 0x7f88c4011bb8 handle_shutdown_pre_release: r=0 > > 2020-05-07T13:02:28.702+0300 7f88d4ff9700 10 > librbd::managed_lock::ReleaseRequest: 0x7f88c80b68a0 send_unlock: > entity=client.58292796, cookie=auto 140225447738256 > > 2020-05-07T13:02:28.702+0300 7f88d57fa700 10 > librbd::managed_lock::ReleaseRequest: 0x7f88c80b68a0 handle_unlock: r=0 > > 2020-05-07T13:02:28.702+0300 7f88d4ff9700 10 librbd::ExclusiveLock: > 0x7f88c4011ba0 post_release_lock_handler: r=0 shutting_down=1 > > 2020-05-07T13:02:28.702+0300 7f88d4ff9700 5 librbd::io::ImageRequestWQ: > 0x7f88e8001570 unblock_writes: 0x5569b5e1ffd0, num=0 > > 2020-05-07T13:02:28.702+0300 7f88d4ff9700 10 librbd::ImageWatcher: > 0x7f88c400dfe0 notify released lock > > 2020-05-07T13:02:28.702+0300 7f88d4ff9700 10 librbd::ImageWatcher: > 0x7f88c400dfe0 current lock owner: [0,0] > > 2020-05-07T13:02:28.702+0300 7f88d4ff9700 10 librbd::ManagedLock: > 0x7f88c4011bb8 handle_shutdown_post_release: r=0 > > 2020-05-07T13:02:28.702+0300 7f88d4ff9700 10 librbd::ManagedLock: > 0x7f88c4011bb8 wait_for_tracked_ops: r=0 > > 2020-05-07T13:02:28.702+0300 7f88d4ff9700 10 librbd::ManagedLock: > 0x7f88c4011bb8 complete_shutdown: r=0 > > 2020-05-07T13:02:28.702+0300 7f88d4ff9700 10 > librbd::image::CloseRequest: 0x7f88c8175fd0 > handle_shut_down_exclusive_lock: r=0 > > 2020-05-07T13:02:28.702+0300 7f88d4ff9700 10 > librbd::image::CloseRequest: 0x7f88c8175fd0 send_unregister_image_watcher > > 2020-05-07T13:02:28.702+0300 7f88d4ff9700 10 librbd::ImageWatcher: > 0x7f88c400dfe0 unregistering image watcher > > 2020-05-07T13:02:28.702+0300 7f88d4ff9700 10 librbd::Watcher: > 0x7f88c400dfe0 unregister_watch: > > 2020-05-07T13:02:28.702+0300 7f88d57fa700 5 librbd::Watcher: > 0x7f88c400dfe0 notifications_blocked: blocked=1 > > 2020-05-07T13:02:28.706+0300 7f88ceffd700 10 > librbd::image::CloseRequest: 0x7f88c8175fd0 > handle_unregister_image_watcher: r=0 > > 2020-05-07T13:02:28.706+0300 7f88ceffd700 10 > librbd::image::CloseRequest: 0x7f88c8175fd0 send_flush_readahead > > 2020-05-07T13:02:28.706+0300 7f88d4ff9700 10 > librbd::image::CloseRequest: 0x7f88c8175fd0 handle_flush_readahead: r=0 > > 2020-05-07T13:02:28.706+0300 7f88d4ff9700 10 > librbd::image::CloseRequest: 0x7f88c8175fd0 send_shut_down_object_dispatcher > > 2020-05-07T13:02:28.706+0300 7f88d4ff9700 5 > librbd::io::ObjectDispatcher: 0x5569b5dab700 shut_down: > > 2020-05-07T13:02:28.706+0300 7f88d4ff9700 5 librbd::io::ObjectDispatch: > 0x5569b5ee8360 shut_down: > > 2020-05-07T13:02:28.706+0300 7f88d4ff9700 5 > librbd::io::SimpleSchedulerObjectDispatch: 0x7f88c4013ce0 shut_down: > > 2020-05-07T13:02:28.706+0300 7f88d4ff9700 5 > librbd::cache::WriteAroundObjectDispatch: 0x7f88c8003780 shut_down: > > 2020-05-07T13:02:28.706+0300 7f88d4ff9700 10 > librbd::image::CloseRequest: 0x7f88c8175fd0 > handle_shut_down_object_dispatcher: r=0 > > 2020-05-07T13:02:28.706+0300 7f88d4ff9700 10 > librbd::image::CloseRequest: 0x7f88c8175fd0 send_flush_op_work_queue > > 2020-05-07T13:02:28.706+0300 7f88d4ff9700 10 > librbd::image::CloseRequest: 0x7f88c8175fd0 handle_flush_op_work_queue: r=0 > > 2020-05-07T13:02:28.706+0300 7f88d4ff9700 10 > librbd::image::CloseRequest: 0x7f88c8175fd0 handle_flush_image_watcher: r=0 > > 2020-05-07T13:02:28.706+0300 7f88d4ff9700 10 librbd::ImageState: > 0x5569b5da9bb0 0x5569b5da9bb0 handle_close: r=0 > > > > On Fri, May 8, 2020 at 12:40 AM Brad Hubbard <bhubbard@xxxxxxxxxx> > wrote: > >> > >> On Fri, May 8, 2020 at 3:42 AM Erwin Lubbers <erwin@xxxxxxxxxxx> wrote: > >> > > >> > Hi, > >> > > >> > Did anyone found a way to resolve the problem? I'm seeing the same on > a clean Octopus Ceph installation on Ubuntu 18 with an Octopus compiled KVM > server running on CentOS 7.8. The KVM machine shows: > >> > > >> > [ 7682.233684] fn-radosclient[6060]: segfault at 2b19 ip > 00007f8165cc0a50 sp 00007f81397f6490 error 4 in > librbd.so.1.12.0[7f8165ab4000+537000] > >> > >> Are you able to either capture a backtrace from a coredump or set up > >> logging and hopefully capture a backtrace that way? > >> > >> > > >> > Ceph is healthy and stable for a few weeks and I did not get these > messages while running on KVM compiled with Luminous libraries. > >> > > >> > Regards, > >> > Erwin > >> > _______________________________________________ > >> > ceph-users mailing list -- ceph-users@xxxxxxx > >> > To unsubscribe send an email to ceph-users-leave@xxxxxxx > >> > > >> > >> > >> -- > >> Cheers, > >> Brad > >> _______________________________________________ > >> ceph-users mailing list -- ceph-users@xxxxxxx > >> To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > -- > Cheers, > Brad > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx