On Mon, Apr 03 2017, Jeff Layton wrote: > On Mon, 2017-04-03 at 12:16 -0700, Matthew Wilcox wrote: >> On Mon, Apr 03, 2017 at 01:47:37PM -0400, Jeff Layton wrote: >> > > I wonder whether it's even worth supporting both EIO and ENOSPC for a >> > > writeback problem. If I understand correctly, at the time of write(), >> > > filesystems check to see if they have enough blocks to satisfy the >> > > request, so ENOSPC only comes up in the writeback context for thinly >> > > provisioned devices. >> > >> > No, ENOSPC on writeback can certainly happen with network filesystems. >> > NFS and CIFS have no way to reserve space. You wouldn't want to have to >> > do an extra RPC on every buffered write. :) >> >> Aaah, yes, network filesystems. I would indeed not want to do an extra >> RPC on every write to a hole (it's a hole vs non-hole question, rather >> than a buffered/unbuffered question ... unless you're WAFLing and not >> reclaiming quickly enough, I suppose). >> >> So, OK, that makes sense, we should keep allowing filesystems to report >> ENOSPC as a writeback error. But I think much of the argument below >> still holds, and we should continue to have a prior EIO to be reported >> over a new ENOSPC (even if the program has already consumed the EIO). >> > > I'm fine with that (though I'd like Neil's thoughts before we decide > anything) there. I'd like there be a well defined time when old errors were forgotten. It does make sense for EIO to persist even if ENOSPC or EDQUOT is received, but not forever. Clearing the remembered errors when put_write_access() causes i_writecount to reach zero is one option (as suggested), but I'm not sure I'm happy with it. Local filesystems, or network filesystems which receive strong write delegations, should only ever return EIO to fsync. We should concentrate on them first, I think. As there is only one possible error, the seq counter is sufficient to "clear" it once it has been reported to fsync() (or write()?). Other network filesystems could return a whole host of errors: ENOSPC EDQUOT ESTALE EPERM EFBIG ... Do we want to limit exactly which errors are allowed in generic code, or do we just support EIO generically and expect the filesystem to sort out the details for anything else? One possible approach a filesystem could take is just to allow a single async writeback error. After that error, all subsequent write() system calls become synchronous. As write() or fsync() is called on each file descriptor (which could possibly have sent the write which caused the error), an error is returned and that fact is counted. Once we have returned as many errors as there are open file descriptors (i_writecount?), and have seen a successful write, the filesystem forgets all recorded errors and switches back to async writes (for that inode). NFS does this switch-to-sync-on-error. See nfs_need_check_write(). The "which could possibly have sent the write which caused the error" is an explicit reference to NFS. NFS doesn't use the AS_EIO/AS_ENOSPC flags to return async errors. It allocates an nfs_open_context for each user who opens a given inode, and stores an error in there. Each dirty pages is associated with one of these, so errors a sure to go to the correct user, though not necessarily the correct fd at present. When we specify the new behaviour we should be careful to be as vague as possible while still saying what we need. This allows filesystems some flexibility. If an error happens during writeback, the next write() or fsync() (or ....) on the file descriptor to which data was written will return -1 with errno set to EIO or some other relevant error. Other file descriptors open on the same file may receive EIO or some other error on a subsequent appropriate system call. It should not be assumed that close() will return an error. fsync() must be called before close() if writeback errors are important to the application. Thanks, NeilBrown
Attachment:
signature.asc
Description: PGP signature