Thanks for the detailed feedback! Comments inline below: ----- Original Message ----- > From: "Loris Cuoghi" <lc@xxxxxxxxxxxxxxxxx> > To: "Jason Dillaman" <dillaman@xxxxxxxxxx> > Cc: ceph-users@xxxxxxxxxxxxxx > Sent: Tuesday, March 8, 2016 5:57:30 AM > Subject: Re: Can I rebuild object maps while VMs are running ? > > > Le 07/03/2016 17:58, Jason Dillaman a écrit : > > Documentation of these new RBD features is definitely lacking and > I've opened a tracker ticket to improve it [1]. > > > > [1] http://tracker.ceph.com/issues/15000 > > > > Hey, thank you Jason! :) > > > That's disheartening to hear that your RBD images were corrupted -- > do you have any more detail as to what happened? Enabling the object > map is designed to flag the object map as invalid, so it won't be used > as a reference for any IO ops until it is successfully rebuilt. > > > > No problem at all, as it was just a test. As I saw the VM losing its > disk (the file system wasn't available anymore even for a remount), the > VM went all "IO error" on every disk access, but the shell was still there. > > Infernalis 9.2.0 (it was the day just before the 9.2.1 bugfix release :) > ). KVM + QEMU + libvirt with RBD disks on the QEMU side. > > The RBDs only had the layering feature enabled. > > With the VMs powered on, I enabled the exclusive-lock, the object map > and fast-diff features. I then proceeded rebuilding the object-map in > order to test the "rbd du" command... and get rid of some pesky > invalid-object-map-related warnings. Yes, some warning showed up when I > typed "rbd info XXX" or "rbd du XX" before successfully compiling the > object-map. > > The object-map rebuild process started with more of these "warnings", > ran for a while showing the completion percentage as it's supposed to, > but it presented me some more warnings at the end. Meanwhile, in > VM-land, the disk was inaccessible from the kernel point of view, ext4 > presented me its best "goodbye cruel world" testament, while, naturally, > the shell reacted with a nice IO error at every attempt of running > external programs. Clearly, with any IO interrupted until next reboot, > for us any on-disk data is considered as corrupted. > > Did the "rbd rebuild" process take the exclusive-lock on the RBD ? The rebuild process should only acquire the exclusive lock if it's not already owned by another process (qemu). It's possible that quickly enabling exclusive lock followed by rebuilding the object map might result in the rbd CLI acquiring the lock before the qemu process if there wasn't a steady-stream of writes to the VM image. In such a case, qemu would attempt to acquire the lock on its next write attempt. We had some bugs related to the transition of the exclusive lock from one process to another that have since been cleared up. > > I'm re-testing this while I'm typing, on infernalis 9.2.1. > > Added the exclusive-lock, object-map and fast-diff features, a lot of > sweating on my side, but the guest kernel didn't blink. > > Started rebuilding the object-map aaaaand... > > Object Map Rebuild: 99% complete...failed. > rbd: rebuilding object map failed: (16) Device or resource busy Definitely not expected. I've opened a new tracker ticket for this issue [1]. > The VM continues working without interruption. > > Non-intensive but continuous IO on the guest filesystem was ongoing the > whole time. > > Stopped any IO, launched the rebuild, same "busy" message. > > Stopped the VM, rebuild. > > Object Map Rebuild: 100% complete...done. > > That's a lot better than the previous experience. :) > > Thanks > > Loris > [1] http://tracker.ceph.com/issues/15007 Thanks, -- Jason Dillaman _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com