On Mon, Jan 30, 2023 at 12:53:22PM +0000, Daniel P. Berrangé wrote: > On Mon, Jan 30, 2023 at 09:30:40PM +0900, Damien Le Moal wrote: > > On 1/30/23 21:21, Daniel P. Berrangé wrote: > > > On Wed, Jan 11, 2023 at 10:24:30AM -0500, Stefan Hajnoczi wrote: > > >> On Tue, Jan 10, 2023 at 03:29:47PM +0000, Daniel P. Berrangé wrote: > > >>> On Tue, Jan 10, 2023 at 10:19:51AM -0500, Stefan Hajnoczi wrote: > > >>>> Hi Peter, > > >>>> Zoned storage support > > >>>> (https://zonedstorage.io/docs/introduction/zoned-storage) is being added > > >>>> to QEMU. Given a zoned host block device, the QEMU syntax will look like > > >>>> this: > > >>>> > > >>>> --blockdev zoned_host_device,node-name=drive0,filename=/dev/$BDEV,... > > >>>> --device virtio-blk-pci,drive=drive0 > > >>>> > > >>>> Note that regular --blockdev host_device will not work. > > >>>> > > >>>> For now the virtio-blk device is the only one that supports zoned > > >>>> blockdevs. > > >>> > > >>> Does the virtio-blk device expowsed guest ABI differ at all > > >>> when connected zoned_host_device instead of host_device ? > > >> > > >> Yes. There is a VIRTIO feature bit, some configuration space fields, > > >> etc. virtio-blk-pci detects when the blockdev is zoned and enables the > > >> feature bit. > > > > > > I get a general sense of unease when frontend device ABI sensitive > > > features get secretly toggled based on features exposed by the > > > backend. > > > > > > When trying to validate ABI compatibility of guest configs, libvirt > > > would generally compare frontend properties to look for differences. > > > > > > There are a small set of cases where backends affect frontend > > > features, but it is not that common to see. > > > > > > Consider what happens if we have a guest running no zoned storage, > > > and we need to evacuate the host to a machine without zoned > > > storage available. Could we replace the stroage backend on the > > > target host with a raw/qcow2 backend but keep pretending it is > > > zoned storage to the guest. The guest would continue making its > > > I/O ops be batched for the zoned storage, which would be redundant > > > for raw/qcow2, but presumbly should still work. If this is possible > > > it would suggest the need to have explicit settings for zoned storage > > > on the virtio-blk frontend. QEMU would "merely" validate that these > > > settings are turned on, if the host storage is zoned too. > > > > > >>>> This brings to mind a few questions: > > >>>> > > >>>> 1. Does libvirt need domain XML syntax for zoned storage? Alternatively, > > >>>> it could probe /sys/block/$BDEV/queue/zoned and generate the correct > > >>>> QEMU command-line arguments for zoned devices when the contents of > > >>>> the file are not "none". > > >>>> > > >>>> 2. Should QEMU --blockdev host_device detected zoned devices so that > > >>>> --blockdev zoned_host_device is not necessary? That way libvirt would > > >>>> automatically support zoned storage without any domain XML syntax or > > >>>> libvirt code changes. > > >>>> > > >>>> The drawbacks I see when QEMU detects zoned storage automatically: > > >>>> - You can't easiy tell if a blockdev is zoned from the command-line. > > >>>> - It's possible to mismatch zoned and non-zoned devices across live > > >>>> migration. > > >>> > > >>> What happens with existing QEMU impls if you use --blockdev host_device > > >>> pointing to a /dev/$BDEV that is a zoned device ? If it succeeds and > > >>> works correctly, then we likely need to continue to support that. This > > >>> would push towards needing a new XML element. > > >> > > >> Pointing host_device at a zoned device doesn't result in useful behavior > > >> because the guest is unaware that this is a zoned device. The guest > > >> won't be able to access the device correctly (i.e. sequential writes > > >> only). Write requests will fail eventually. > > >> > > >> I would consider zoned devices totally unsupported in QEMU today and we > > >> don't need to worry about preserving any kind of backwards compatibility > > >> with --blockdev host_device,filename=/dev/my_zoned_device. > > > > > > So I guess I'm not so worried about host_device vs zoned_host_device, > > > if we have explicit settings for controlled zoned behaviour on the > > > virtio-blk frontend. > > > > > > I feel like we should have something explicit somewhere though, as this > > > is a pretty significant difference in the storage stack, that I think > > > mgmt apps should be aware of, as it has implications for how you manage > > > the VMs on an ongoing basis. > > > > > > We could still have it "do what I mean" by default though. eg the > > > virtio-blk setting defaults could imply "match the host", so we get > > > effectively a tri-state (zoned=on/off/auto) > > > > What would zoned=on mean ? If the backend is not zoned, virtio will expose a > > regular block device to the guest as it should. > > Sorry, I should have expanded further, I didn't mean that alone. It would > also need to expose the related settings of the virtio-blk device: > > > + virtio_stl_p(vdev, &blkcfg.zoned.zone_sectors, > > + bs->bl.zone_size / 512); > > + virtio_stl_p(vdev, &blkcfg.zoned.max_active_zones, > > + bs->bl.max_active_zones); > > + virtio_stl_p(vdev, &blkcfg.zoned.max_open_zones, > > + bs->bl.max_open_zones); > > + virtio_stl_p(vdev, &blkcfg.zoned.write_granularity, blk_size); > > + virtio_stl_p(vdev, &blkcfg.zoned.max_append_sectors, > > + bs->bl.max_append_sectors); > > so eg > > -device virtio-blk,zoned=on,zone_sectors=NN,max_active_zones=NN,max_open_zones=NN.... > > > So the guest would be honouring thuese zone constraints, even though they > are not required by a raw/qcow2 file. > > in this world > > -device virtio-blk,zoned=on > > would be a short hand to say get the rest of the tunables from the backend > device or error, if the backend doesn't support them. > > -device virtio-blk,zoned=auto > > would be a short hand to say "do the right thing" regardless of whether the > backend is zoned or non-zoned. > > > For zoned=auto, same, I am not sure what that would achieve. If the backend is > > zoned, it will be seen as zoned by the guest. If the backend is a regular disk, > > it will be exposed as a regular disk. So what would this option achieve ? > > > > And for zoned=off, I guess you would want to ignore a backend drive if it is zoned ? > > It would explicitly report an error, since IIUC from Stefan's reply, this > scenario would eventually end in I/O failures. What you've described sounds good to me: 1. By default it exposes the device, no questions asked. 2. Management tools like libvirt can explicitly request zoned=on/off, zone_sectors=..., etc to prevent misconfiguration. Best of both worlds. Stefan
Attachment:
signature.asc
Description: PGP signature