Re: virtio-blk/ext4 error handling for host-side ENOSPC

Keiichi Watanabe <keiichiw@xxxxxxxxxxxx> · Fri, 28 Jun 2024 12:29:05 +0900

Hi Stefan,

Thanks for sharing QEMU's approach!
We also have a similar early notification mechanism to avoid low-disk
conditions.
However, the approach I would like to propose is to prevent pausing
the guest by allowing the guest retry requests after a while.

On Wed, Jun 19, 2024 at 10:57 PM Stefan Hajnoczi <stefanha@xxxxxxxxxx> wrote:
>
> > What do you think of this idea? Also, has anything similar been attempted yet?
>
> Hi Keiichi,
> Yes, there is an existing approach that is related but not identical to
> what you are exploring:
>
> QEMU has an option to pause the guest and raise a notification to the
> management tool that ENOSPC has been reached. The guest is unable to
> resolve ENOSPC itself and guest applications are likely to fail the disk
> becomes unavailable, hence the guest is simply paused.
>
> In systems that expect to hit this condition, this pause behavior can be
> combined with an early notification when a free space watermark is hit.
> This way guest are almost never paused because free space can be added
> before ENOSPC is reached. QEMU has a write watermark feature that works
> well on top of qcow2 images (they grow incrementally so it's trivial to
> monitor how much space is being consumed).
>
> I wanted to share this existing approach in case you think it would work
> nicely for your use case.
>
> The other thought I had was: how does the new ENOSPC error fit into the
> block device model? Hopefully this behavior is not virtio-blk-specific
> behavior but rather something general that other storage protocols like
> NVMe and SCSI support too. That way file systems can handle this in a
> generic fashion.
>
> The place I would check is Logical Block Provisioning in SCSI and NVMe.
> Perhaps there are features in these protocols for reporting low
> resources? (Sorry, I didn't have time to check.)

For scsi, THIN_PROVISIONING_SOFT_THRESHOLD_REACHED looks like the one.
For NVMe, NVME_SC_CAPACITY_EXCEEDED looks like this.

I guess we can add a new error state in ext4 layer. Le'ts say it's
"HOST_NOSPACE" in ext4. This should be used when virtio-blk returns
ENOSPACE or virtio-scsi returns
THIN_PROVISIONING_SOFT_THRESHOLD_REACHED. I'm not sure if there is a
case where NVME_SC_CAPACITY_EXCEEDED is translated to this state
because we don't have virito-nvme.
If ext4 is in the state of HOST_NOSPACE, ext4 will periodically try to
write to the disk (= virtio-blk or virtio-scsi) several times. If this
fails a certain number of times, the guest will report a disk error.
What do you think?

Best,
Keiichi

>
> Stefan