Any clue what Windows is doing issuing a discard against an extent that has an in-flight read? If this is repeatable, can you add "debug rbd = 20" and "debug objectcacher = 20" to your hypervisor's ceph.conf and attach the Ceph log to a tracker ticket? On Tue, May 8, 2018 at 8:09 PM, Richard Bade <hitrich@xxxxxxxxx> wrote: > Hi Everyone, > We run some hosts with Proxmox 4.4 connected to our ceph cluster for > RBD storage. Occasionally we get a vm suddenly stop with no real > explanation. The last time this happened to one particular vm I turned > on some qemu logging via Proxmox Monitor tab for the vm and got this > dump this time when the vm stopped again: > > osdc/ObjectCacher.cc: In function 'void > ObjectCacher::Object::discard(loff_t, loff_t)' thread 7f1c6ebfd700 > time 2018-05-08 07:00:47.816114 > osdc/ObjectCacher.cc: 533: FAILED assert(bh->waitfor_read.empty()) > ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe) > 1: (()+0x2d0712) [0x7f1c8e093712] > 2: (()+0x52c107) [0x7f1c8e2ef107] > 3: (()+0x52c45f) [0x7f1c8e2ef45f] > 4: (()+0x82107) [0x7f1c8de45107] > 5: (()+0x83388) [0x7f1c8de46388] > 6: (()+0x80e74) [0x7f1c8de43e74] > 7: (()+0x86db0) [0x7f1c8de49db0] > 8: (()+0x2c0ddf) [0x7f1c8e083ddf] > 9: (()+0x2c1d00) [0x7f1c8e084d00] > 10: (()+0x8064) [0x7f1c804e0064] > 11: (clone()+0x6d) [0x7f1c8021562d] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is > needed to interpret this. > > We're using virtio-scsi for the disk with discard option and writeback > cache enabled. The vm is Win2012r2. > > Has anyone seen this before? Is there a resolution? > I couldn't find any mention of this while googling for various key > words in the dump. > > Regards, > Richard > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Jason _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com