Re: virtio-blk/ext4 error handling for host-side ENOSPC

Stefan Hajnoczi <stefanha@xxxxxxxxxx> · Thu, 11 Jul 2024 08:02:46 +0200

On Fri, Jun 28, 2024 at 12:29:05PM +0900, Keiichi Watanabe wrote:
> Hi Stefan,
> 
> Thanks for sharing QEMU's approach!
> We also have a similar early notification mechanism to avoid low-disk
> conditions.
> However, the approach I would like to propose is to prevent pausing
> the guest by allowing the guest retry requests after a while.
> 
> On Wed, Jun 19, 2024 at 10:57 PM Stefan Hajnoczi <stefanha@xxxxxxxxxx> wrote:
> >
> > > What do you think of this idea? Also, has anything similar been attempted yet?
> >
> > Hi Keiichi,
> > Yes, there is an existing approach that is related but not identical to
> > what you are exploring:
> >
> > QEMU has an option to pause the guest and raise a notification to the
> > management tool that ENOSPC has been reached. The guest is unable to
> > resolve ENOSPC itself and guest applications are likely to fail the disk
> > becomes unavailable, hence the guest is simply paused.
> >
> > In systems that expect to hit this condition, this pause behavior can be
> > combined with an early notification when a free space watermark is hit.
> > This way guest are almost never paused because free space can be added
> > before ENOSPC is reached. QEMU has a write watermark feature that works
> > well on top of qcow2 images (they grow incrementally so it's trivial to
> > monitor how much space is being consumed).
> >
> > I wanted to share this existing approach in case you think it would work
> > nicely for your use case.
> >
> > The other thought I had was: how does the new ENOSPC error fit into the
> > block device model? Hopefully this behavior is not virtio-blk-specific
> > behavior but rather something general that other storage protocols like
> > NVMe and SCSI support too. That way file systems can handle this in a
> > generic fashion.
> >
> > The place I would check is Logical Block Provisioning in SCSI and NVMe.
> > Perhaps there are features in these protocols for reporting low
> > resources? (Sorry, I didn't have time to check.)
> 
> For scsi, THIN_PROVISIONING_SOFT_THRESHOLD_REACHED looks like the one.
> For NVMe, NVME_SC_CAPACITY_EXCEEDED looks like this.
> 
> I guess we can add a new error state in ext4 layer. Le'ts say it's
> "HOST_NOSPACE" in ext4. This should be used when virtio-blk returns
> ENOSPACE or virtio-scsi returns
> THIN_PROVISIONING_SOFT_THRESHOLD_REACHED. I'm not sure if there is a
> case where NVME_SC_CAPACITY_EXCEEDED is translated to this state
> because we don't have virito-nvme.
> If ext4 is in the state of HOST_NOSPACE, ext4 will periodically try to
> write to the disk (= virtio-blk or virtio-scsi) several times. If this
> fails a certain number of times, the guest will report a disk error.
> What do you think?

I'm sure virtio-blk can be extended if you can work with the file system
maintainers to introduce the concept of logical block exhaustion. There
might be complications for fsync and memory pressure if pages cannot be
written back to exhausted devices, but I'm not an expert.

Stefan
Attachment:
signature.asc

Description: PGP signature