Re: virtual machines crashes after upgrade to octopus

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, May 8, 2020 at 12:10 PM Lomayani S. Laizer <lomlaizer@xxxxxxxxx> wrote:
>
> Hello,
> On my side at point of vm crash these are logs below. At the moment my debug is at 10 value. I will rise to 20 for full debug. these crashes are random and so far happens on very busy vms. Downgrading clients in host to Nautilus these crashes disappear

You could try adding debug_rados as well but you may get a very large
log so keep an eye on things.

>
> Qemu is not shutting down in general because other vms on the same host continues working

A process can not reliably continue after encountering a segfault so
the qemu-kvm process must be ending and therefore it should be
possible to capture a coredump with the right configuration.

In the following example, if you were to search for pid 6060 you would
find it is no longer running.
>> > [ 7682.233684] fn-radosclient[6060]: segfault at 2b19 ip 00007f8165cc0a50 sp 00007f81397f6490 error 4 in librbd.so.1.12.0[7f8165ab4000+537000]

Without a backtrace at a minimum it may be very difficult to work out
what's going on with certainty. If you open a tracker for the issue
though maybe one of the devs specialising in rbd may have some
feedback.

>
> 2020-05-07T13:02:12.121+0300 7f88d57fa700 10 librbd::io::ReadResult: 0x7f88c80bfbf0 finish:  got {} for [0,24576] bl 24576
> 2020-05-07T13:02:12.193+0300 7f88d57fa700 10 librbd::io::ReadResult: 0x7f88c80f9330 finish: C_ObjectReadRequest: r=0
> 2020-05-07T13:02:12.193+0300 7f88d57fa700 10 librbd::io::ReadResult: 0x7f88c80f9330 finish:  got {} for [0,16384] bl 16384
> 2020-05-07T13:02:28.694+0300 7f890ba90500 10 librbd::ImageState: 0x5569b5da9bb0 0x5569b5da9bb0 send_close_unlock
> 2020-05-07T13:02:28.694+0300 7f890ba90500 10 librbd::ImageState: 0x5569b5da9bb0 0x5569b5da9bb0 send_close_unlock
> 2020-05-07T13:02:28.694+0300 7f890ba90500 10 librbd::image::CloseRequest: 0x7f88c8175fd0 send_block_image_watcher
> 2020-05-07T13:02:28.694+0300 7f890ba90500 10 librbd::ImageWatcher: 0x7f88c400dfe0 block_notifies
> 2020-05-07T13:02:28.694+0300 7f890ba90500  5 librbd::Watcher: 0x7f88c400dfe0 block_notifies: blocked_count=1
> 2020-05-07T13:02:28.694+0300 7f890ba90500 10 librbd::image::CloseRequest: 0x7f88c8175fd0 handle_block_image_watcher: r=0
> 2020-05-07T13:02:28.694+0300 7f890ba90500 10 librbd::image::CloseRequest: 0x7f88c8175fd0 send_shut_down_update_watchers
> 2020-05-07T13:02:28.694+0300 7f88d4ff9700 10 librbd::image::CloseRequest: 0x7f88c8175fd0 handle_shut_down_update_watchers: r=0
> 2020-05-07T13:02:28.694+0300 7f88d4ff9700 10 librbd::image::CloseRequest: 0x7f88c8175fd0 send_shut_down_io_queue
> 2020-05-07T13:02:28.694+0300 7f88d4ff9700  5 librbd::io::ImageRequestWQ: 0x7f88e8001570 shut_down: shut_down: in_flight=0
> 2020-05-07T13:02:28.694+0300 7f88d4ff9700 10 librbd::image::CloseRequest: 0x7f88c8175fd0 handle_shut_down_io_queue: r=0
> 2020-05-07T13:02:28.694+0300 7f88d4ff9700 10 librbd::image::CloseRequest: 0x7f88c8175fd0 send_shut_down_exclusive_lock
> 2020-05-07T13:02:28.694+0300 7f88d4ff9700 10 librbd::ExclusiveLock: 0x7f88c4011ba0 shut_down
> 2020-05-07T13:02:28.694+0300 7f88d4ff9700 10 librbd::ManagedLock: 0x7f88c4011bb8 shut_down:
> 2020-05-07T13:02:28.694+0300 7f88d4ff9700 10 librbd::ManagedLock: 0x7f88c4011bb8 send_shutdown:
> 2020-05-07T13:02:28.694+0300 7f88d4ff9700 10 librbd::ManagedLock: 0x7f88c4011bb8 send_shutdown_release:
> 2020-05-07T13:02:28.694+0300 7f88d4ff9700 10 librbd::ExclusiveLock: 0x7f88c4011ba0 pre_release_lock_handler
> 2020-05-07T13:02:28.694+0300 7f88d4ff9700 10 librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020 send_cancel_op_requests:
> 2020-05-07T13:02:28.694+0300 7f88d4ff9700 10 librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020 handle_cancel_op_requests: r=0
> 2020-05-07T13:02:28.694+0300 7f88d4ff9700 10 librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020 send_block_writes:
> 2020-05-07T13:02:28.698+0300 7f88d4ff9700  5 librbd::io::ImageRequestWQ: 0x7f88e8001570 block_writes: 0x5569b5e1ffd0, num=1
> 2020-05-07T13:02:28.698+0300 7f88d4ff9700 10 librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020 handle_block_writes: r=0
> 2020-05-07T13:02:28.698+0300 7f88d4ff9700 10 librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020 send_wait_for_ops:
> 2020-05-07T13:02:28.698+0300 7f88d4ff9700 10 librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020 handle_wait_for_ops:
> 2020-05-07T13:02:28.698+0300 7f88d4ff9700 10 librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020 send_invalidate_cache:
> 2020-05-07T13:02:28.698+0300 7f88d4ff9700  5 librbd::io::ObjectDispatcher: 0x5569b5dab700 invalidate_cache:
> 2020-05-07T13:02:28.698+0300 7f88d4ff9700 10 librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020 handle_invalidate_cache: r=0
> 2020-05-07T13:02:28.698+0300 7f88d4ff9700 10 librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020 send_flush_notifies:
> 2020-05-07T13:02:28.698+0300 7f88d4ff9700 10 librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020 handle_flush_notifies:
> 2020-05-07T13:02:28.698+0300 7f88d4ff9700 10 librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020 send_close_object_map:
> 2020-05-07T13:02:28.698+0300 7f88d4ff9700 10 librbd::object_map::UnlockRequest: 0x7f88c807a450 send_unlock: oid=rbd_object_map.2f18f2a67fad72
> 2020-05-07T13:02:28.702+0300 7f88d57fa700 10 librbd::object_map::UnlockRequest: 0x7f88c807a450 handle_unlock: r=0
> 2020-05-07T13:02:28.702+0300 7f88d57fa700 10 librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020 handle_close_object_map: r=0
> 2020-05-07T13:02:28.702+0300 7f88d57fa700 10 librbd::exclusive_lock::PreReleaseRequest: 0x7f88c80b6020 send_unlock:
> 2020-05-07T13:02:28.702+0300 7f88d4ff9700 10 librbd::ManagedLock: 0x7f88c4011bb8 handle_shutdown_pre_release: r=0
> 2020-05-07T13:02:28.702+0300 7f88d4ff9700 10 librbd::managed_lock::ReleaseRequest: 0x7f88c80b68a0 send_unlock: entity=client.58292796, cookie=auto 140225447738256
> 2020-05-07T13:02:28.702+0300 7f88d57fa700 10 librbd::managed_lock::ReleaseRequest: 0x7f88c80b68a0 handle_unlock: r=0
> 2020-05-07T13:02:28.702+0300 7f88d4ff9700 10 librbd::ExclusiveLock: 0x7f88c4011ba0 post_release_lock_handler: r=0 shutting_down=1
> 2020-05-07T13:02:28.702+0300 7f88d4ff9700  5 librbd::io::ImageRequestWQ: 0x7f88e8001570 unblock_writes: 0x5569b5e1ffd0, num=0
> 2020-05-07T13:02:28.702+0300 7f88d4ff9700 10 librbd::ImageWatcher: 0x7f88c400dfe0 notify released lock
> 2020-05-07T13:02:28.702+0300 7f88d4ff9700 10 librbd::ImageWatcher: 0x7f88c400dfe0 current lock owner: [0,0]
> 2020-05-07T13:02:28.702+0300 7f88d4ff9700 10 librbd::ManagedLock: 0x7f88c4011bb8 handle_shutdown_post_release: r=0
> 2020-05-07T13:02:28.702+0300 7f88d4ff9700 10 librbd::ManagedLock: 0x7f88c4011bb8 wait_for_tracked_ops: r=0
> 2020-05-07T13:02:28.702+0300 7f88d4ff9700 10 librbd::ManagedLock: 0x7f88c4011bb8 complete_shutdown: r=0
> 2020-05-07T13:02:28.702+0300 7f88d4ff9700 10 librbd::image::CloseRequest: 0x7f88c8175fd0 handle_shut_down_exclusive_lock: r=0
> 2020-05-07T13:02:28.702+0300 7f88d4ff9700 10 librbd::image::CloseRequest: 0x7f88c8175fd0 send_unregister_image_watcher
> 2020-05-07T13:02:28.702+0300 7f88d4ff9700 10 librbd::ImageWatcher: 0x7f88c400dfe0 unregistering image watcher
> 2020-05-07T13:02:28.702+0300 7f88d4ff9700 10 librbd::Watcher: 0x7f88c400dfe0 unregister_watch:
> 2020-05-07T13:02:28.702+0300 7f88d57fa700  5 librbd::Watcher: 0x7f88c400dfe0 notifications_blocked: blocked=1
> 2020-05-07T13:02:28.706+0300 7f88ceffd700 10 librbd::image::CloseRequest: 0x7f88c8175fd0 handle_unregister_image_watcher: r=0
> 2020-05-07T13:02:28.706+0300 7f88ceffd700 10 librbd::image::CloseRequest: 0x7f88c8175fd0 send_flush_readahead
> 2020-05-07T13:02:28.706+0300 7f88d4ff9700 10 librbd::image::CloseRequest: 0x7f88c8175fd0 handle_flush_readahead: r=0
> 2020-05-07T13:02:28.706+0300 7f88d4ff9700 10 librbd::image::CloseRequest: 0x7f88c8175fd0 send_shut_down_object_dispatcher
> 2020-05-07T13:02:28.706+0300 7f88d4ff9700  5 librbd::io::ObjectDispatcher: 0x5569b5dab700 shut_down:
> 2020-05-07T13:02:28.706+0300 7f88d4ff9700  5 librbd::io::ObjectDispatch: 0x5569b5ee8360 shut_down:
> 2020-05-07T13:02:28.706+0300 7f88d4ff9700  5 librbd::io::SimpleSchedulerObjectDispatch: 0x7f88c4013ce0 shut_down:
> 2020-05-07T13:02:28.706+0300 7f88d4ff9700  5 librbd::cache::WriteAroundObjectDispatch: 0x7f88c8003780 shut_down:
> 2020-05-07T13:02:28.706+0300 7f88d4ff9700 10 librbd::image::CloseRequest: 0x7f88c8175fd0 handle_shut_down_object_dispatcher: r=0
> 2020-05-07T13:02:28.706+0300 7f88d4ff9700 10 librbd::image::CloseRequest: 0x7f88c8175fd0 send_flush_op_work_queue
> 2020-05-07T13:02:28.706+0300 7f88d4ff9700 10 librbd::image::CloseRequest: 0x7f88c8175fd0 handle_flush_op_work_queue: r=0
> 2020-05-07T13:02:28.706+0300 7f88d4ff9700 10 librbd::image::CloseRequest: 0x7f88c8175fd0 handle_flush_image_watcher: r=0
> 2020-05-07T13:02:28.706+0300 7f88d4ff9700 10 librbd::ImageState: 0x5569b5da9bb0 0x5569b5da9bb0 handle_close: r=0
>
> On Fri, May 8, 2020 at 12:40 AM Brad Hubbard <bhubbard@xxxxxxxxxx> wrote:
>>
>> On Fri, May 8, 2020 at 3:42 AM Erwin Lubbers <erwin@xxxxxxxxxxx> wrote:
>> >
>> > Hi,
>> >
>> > Did anyone found a way to resolve the problem? I'm seeing the same on a clean Octopus Ceph installation on Ubuntu 18 with an Octopus compiled KVM server running on CentOS 7.8. The KVM machine shows:
>> >
>> > [ 7682.233684] fn-radosclient[6060]: segfault at 2b19 ip 00007f8165cc0a50 sp 00007f81397f6490 error 4 in librbd.so.1.12.0[7f8165ab4000+537000]
>>
>> Are you able to either capture a backtrace from a coredump or set up
>> logging and hopefully capture a backtrace that way?
>>
>> >
>> > Ceph is healthy and stable for a few weeks and I did not get these messages while running on KVM compiled with Luminous libraries.
>> >
>> > Regards,
>> > Erwin
>> > _______________________________________________
>> > ceph-users mailing list -- ceph-users@xxxxxxx
>> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> >
>>
>>
>> --
>> Cheers,
>> Brad
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx



-- 
Cheers,
Brad
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux