Hi Stefan, Thanks for sharing QEMU's approach! We also have a similar early notification mechanism to avoid low-disk conditions. However, the approach I would like to propose is to prevent pausing the guest by allowing the guest retry requests after a while. On Wed, Jun 19, 2024 at 10:57 PM Stefan Hajnoczi <stefanha@xxxxxxxxxx> wrote: > > > What do you think of this idea? Also, has anything similar been attempted yet? > > Hi Keiichi, > Yes, there is an existing approach that is related but not identical to > what you are exploring: > > QEMU has an option to pause the guest and raise a notification to the > management tool that ENOSPC has been reached. The guest is unable to > resolve ENOSPC itself and guest applications are likely to fail the disk > becomes unavailable, hence the guest is simply paused. > > In systems that expect to hit this condition, this pause behavior can be > combined with an early notification when a free space watermark is hit. > This way guest are almost never paused because free space can be added > before ENOSPC is reached. QEMU has a write watermark feature that works > well on top of qcow2 images (they grow incrementally so it's trivial to > monitor how much space is being consumed). > > I wanted to share this existing approach in case you think it would work > nicely for your use case. > > The other thought I had was: how does the new ENOSPC error fit into the > block device model? Hopefully this behavior is not virtio-blk-specific > behavior but rather something general that other storage protocols like > NVMe and SCSI support too. That way file systems can handle this in a > generic fashion. > > The place I would check is Logical Block Provisioning in SCSI and NVMe. > Perhaps there are features in these protocols for reporting low > resources? (Sorry, I didn't have time to check.) For scsi, THIN_PROVISIONING_SOFT_THRESHOLD_REACHED looks like the one. For NVMe, NVME_SC_CAPACITY_EXCEEDED looks like this. I guess we can add a new error state in ext4 layer. Le'ts say it's "HOST_NOSPACE" in ext4. This should be used when virtio-blk returns ENOSPACE or virtio-scsi returns THIN_PROVISIONING_SOFT_THRESHOLD_REACHED. I'm not sure if there is a case where NVME_SC_CAPACITY_EXCEEDED is translated to this state because we don't have virito-nvme. If ext4 is in the state of HOST_NOSPACE, ext4 will periodically try to write to the disk (= virtio-blk or virtio-scsi) several times. If this fails a certain number of times, the guest will report a disk error. What do you think? Best, Keiichi > > Stefan