Hi everyone, apologies in advance; this will be long. It's also been through a bunch of edits and rewrites, so I don't know how well I'm expressing myself at this stage — please holler if anything is unclear and I'll be happy to try to clarify. I am currently in the process of investigating the behavior of OpenStack Nova instances when being snapshotted and suspended, in conjunction with qemu-guest-agent (qemu-ga). I realize that RBD-backed Nova/libvirt instances are expected to behave differently from file-backed ones, but I think I might have reason to believe that the RBD-backed ones are indeed behaving incorrectly, and I'd like to verify that. So first up, for comparison, let's recap how a Nova/libvirt/KVM instance behaves when it is *not* backed by RBD (such as, it's using a qcow2 file that is on a Nova compute node in /var/lib/nova/instances), is booted from an image with the hw_qemu_guest_agent=yes meta property set, and runs qemu-guest-agent within the guest: - User issues "nova suspend" or "openstack server suspend". - If nova-compute on the compute node decides that the instance has qemu-guest-agent running (which is the case if it's qemu or kvm, and its image has hw_qemu_guest_agent=yes), it sends a guest-sync command over the guest agent VirtIO serial port. This command registers in the qemu-ga log file in the guest. - nova-compute on the compute node sends a libvirt managed-save command. - Nova reports the instance as suspended. - User issues "nova resume" or "openstack server resume". - nova-compute on the compute node sends a libvirt start command. - Again, if nova-compute on the compute node knows that the instance has qemu-guest-agent running, it sends another command over the serial port, namely guest-set-time. This, too, registers in the guest's qemu-ga log. - Nova reports the instance as active (running normally) again. Now, when I instead use a Nova environment that is fully RBD-backed, I see exactly the same behavior as described above. So I know that in principle, nova-compute/qemu-ga communication works in both an RBD-backed and a non-RBD-backed environment. However, things appear to get very different when it comes to snapshots. Again, starting with a file-backed environment: - User issues "nova image-create" or "openstack server image create". - If nova-compute on the compute node decides that the instance can be quiesced (which is the case if it's qemu or kvm, and its image has hw_qemu_guest_agent=yes), then it sends a "guest-fsfreeze-freeze" command over the guest agent VirtIO serial port. - The guest agent inside the guest loops over all mounted filesystems, and issues the FIFREEZE ioctl (which maps to the kernel freeze_super() function). This can be seen in the qemu-ga log file in the guest, and it is also verifiable by using ftrace on the qemu-ga PID and checking for the freeze_super() function call. - nova-compute then takes a live snapshot of the instance. - Once complete, the guest gets a "guest-fsfreeze-thaw" command, and again I can see this in the qemu-ga log, and with ftrace. And now with RBD: - User issues "nova image-create" or "openstack server image create". - The guest-fsfreeze-freeze agent command never happens. Now I can see the info message from https://opendev.org/openstack/nova/src/commit/7bf75976016aae5d458eca9f6ddac92bfe75dc59/nova/virt/libvirt/driver.py#L2048 in my nova-compute log, which confirms that we're attempting a live snapshot. I also do *not* see the warning from https://opendev.org/openstack/nova/src/commit/7bf75976016aae5d458eca9f6ddac92bfe75dc59/nova/virt/libvirt/driver.py#L2068, so it looks like the direct_snapshot() call from https://opendev.org/openstack/nova/src/commit/7bf75976016aae5d458eca9f6ddac92bfe75dc59/nova/virt/libvirt/driver.py#L2058 succeeds. This is defined in https://opendev.org/openstack/nova/src/commit/7bf75976016aae5d458eca9f6ddac92bfe75dc59/nova/virt/libvirt/imagebackend.py#L1055 and it uses RBD functionality only. Importantly, it never interacts with qemu-ga, so it appears to not worry at all about freezing the filesystem. (Which does seem to contradict https://docs.ceph.com/docs/master/rbd/rbd-openstack/?highlight=uuid#image-properties, by the way, so that may be a documentation bug.) Now here's another interesting part. Were the direct snapshot to fail, if I read https://opendev.org/openstack/nova/src/commit/7bf75976016aae5d458eca9f6ddac92bfe75dc59/nova/virt/libvirt/driver.py#L2081 and https://opendev.org/openstack/nova/src/commit/7bf75976016aae5d458eca9f6ddac92bfe75dc59/nova/virt/libvirt/driver.py#L2144 correctly, the fallback behavior would be as follows: The domain would next be "suspended" (note, again this is Nova suspend, which maps to libvirt managed-save per https://opendev.org/openstack/nova/src/commit/7bf75976016aae5d458eca9f6ddac92bfe75dc59/nova/virt/libvirt/guest.py#L504), then snapshotted using a libvirt call and resumed again post-snapshot. In which case there would be a guest-sync call on suspend. And it's this part that has me a bit worried. If an RBD backed instance, on a successful snapshot, never freezes its filesystem *and* never does any kind of sync, either, doesn't that mean that such an instance can't be made to produce consistent snapshots? (Particularly in the case of write-back caching, which is recommended and normally safe for RBD/virtio devices.) Or is there some magic within the Qemu RBD storage driver that I am unaware of, that makes any such contortions unnecessary? Thanks in advance for your insights! Cheers, Florian _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx