On Tue, 2017-04-04 at 10:09 -0700, Matthew Wilcox wrote: > On Tue, Apr 04, 2017 at 12:25:46PM -0400, Jeff Layton wrote: > > That said, I think giving more specific errors where we can is useful. > > When your program is erroring out and writing 'I/O error' to the logs, > > then how much time will your admins burn before they figure out that it > > really failed because the filesystem was full? > > df is one of the first things I check ... a few years ago, I also learned > to check df -i ... ;-) > > Anyway, given the decision to simply report the last error lets us do this > implementation: > > void filemap_set_wb_error(struct address_space *mapping, int err) > { > struct inode *inode = mapping->host; > unsigned int wb_err; > > if (!err) > return; > /* > * This should be called with the error code that we want to return > * on fsync. Thus, it should always be <= 0. > */ > WARN_ON(err > 0 || err < -MAX_ERRNO); > > spin_lock(&inode->i_lock); > wb_err = ((mapping->wb_err & ~MAX_ERRNO) + (1 << 12)) | -err; > WRITE_ONCE(mapping->wb_err, wb_err); > spin_unlock(&inode->i_lock); > } > I like this idea of being able to store arbitrary error codes there. That should be used judiciously of course, but we already allow returning arbitrary errors via the ->fsync op anyway. I'll plan to incorporate something like that into the next set (with judicious comments and constants). One question...is the i_lock the right way to protect this? I think we could do this locklessly too (cmpxchg in a loop, for instance). I'm not worried about performance here -- it's just nice to be able to call simple stuff like this without worrying about locking. > int filemap_report_wb_error(struct file *file) > { > struct inode *inode = file_inode(file); > unsigned int wb_err = READ_ONCE(mapping->wb_err); > > if (file->f_wb_err == wb_err) > return 0; > return -(wb_err & 4095); > } > > That only gives us 20 bits of counter, but I think that's enough. 2^20 is 1048576, which seems a little small to me. We may end up bumping the counter on every failed I/O. How fast can we generate 1M failed I/Os? :) 2^52 however is 4503599627370496 (4Tios or so) ... that might take a little longer to overflow. Is it worth the cost here to ensure that this won't occur? Actually...we could put this field in the inode instead of the mapping. I know we've traditionally tracked this in the mapping, but is that required here? If we put this field in the inode then perhaps we can union it with something and mitigate the cost of a larger counter...maybe in the i_pipe union? I don't think S_ISREG inodes use anything in there, do they? -- Jeff Layton <jlayton@xxxxxxxxxx>