On Tue, Sep 04, 2018 at 11:44:20AM -0400, Jeff Layton wrote: > On Tue, 2018-09-04 at 22:56 +0800, 焦晓冬 wrote: > > A practical and concrete example may be, > > A disk cleaner program that first searches for garbage files that won't be used > > anymore and save the list in a file (open()-write()-close()) and wait for the > > user to confirm the list of files to be removed. A writeback error occurs > > and the related page/inode/address_space gets evicted while the user is > > taking a long thought about it. Finally, the user hits enter and the > > cleaner begin > > to open() read() the list again. But what gets removed is the old list > > of files that > > was generated several months ago... > > > > Another example may be, > > An email editor and a busy mail sender. A well written mail to my boss is > > composed by this email editor and is saved in a file (open()-write()-close()). > > The mail sender gets notified with the path of the mail file to queue it and > > send it later. A writeback error occurs and the related > > page/inode/address_space gets evicted while the mail is still waiting in the > > queue of the mail sender. Finally, the mail file is open() read() by the sender, > > but what is sent is the mail to my girlfriend that was composed yesterday... > > > > In both cases, the files are not meant to be persisted onto the disk. > > So, fsync() > > is not likely to be called. > > > > So at what point are you going to give up on keeping the data? The > fundamental problem here is an open-ended commitment. We (justifiably) > avoid those in kernel development because it might leave the system > without a way out of a resource crunch. Well, I think the point was that in the above examples you'd prefer that the read just fail--no need to keep the data. A bit marking the file (or even the entire filesystem) unreadable would satisfy posix, I guess. Whether that's practical, I don't know. > > - If the following read() could be served by a page in memory, just returns the > > data. If the following read() could not be served by a page in memory and the > > inode/address_space has a writeback error mark, returns EIO. > > If there is a writeback error on the file, and the request data could > > not be served > > by a page in memory, it means we are reading a (partically) corrupted > > (out-of-data) > > file. Receiving an EIO is expected. > > > > No, an error on read is not expected there. Consider this: > > Suppose the backend filesystem (maybe an NFSv3 export) is really r/o, > but was mounted r/w. An application queues up a bunch of writes that of > course can't be written back (they get EROFS or something when they're > flushed back to the server), but that application never calls fsync. > > A completely unrelated application is running as a user that can open > the file for read, but not r/w. It then goes to open and read the file > and then gets EIO back or maybe even EROFS. > > Why should that application (which did zero writes) have any reason to > think that the error was due to prior writeback failure by a completely > separate process? Does EROFS make sense when you're attempting to do a > read anyway? > > Moreover, what is that application's remedy in this case? It just wants > to read the file, but may not be able to even open it for write to issue > an fsync to "clear" the error. How do we get things moving again so it > can do what it wants? > > I think your suggestion would open the floodgates for local DoS attacks. Do we really care about processes with write permissions (even only local client-side write permissions) being able to DoS readers? In general readers kinda have to trust writers. --b.