Hi, I only just now noticed this work after Dave Chinner pointed towards the feature in the email leading to https://lore.kernel.org/linux-fsdevel/20200211000405.5fohxgpt554gmnhu@xxxxxxxxxxxxxxxxx/ On 2020-01-15 12:10:32 -0800, Linus Torvalds wrote: > So I no longer hate the implementation, but I do want to see the > actual user space users come out of the woodwork and try this out for > their use cases. Postgres has been looking for something roughly like this, fwiw (or well, been forced to). While it's better than it used to be (due to b4678df184b3), we still have problems to reliably detect buffered IO errors, especially when done across multiple processes. We can't easily keep an fd open that predates all writes to a file until, and ensure that fsyncs will happen only on that fd. The primary reasons for that are 1) every connection (& some internal jobs) is a process, and neither do want to to fsyncing each touched file in short-lived connections, nor is it desirable to have to add the complication of having to transfer fds between processes just to reliably get an error in fsync(). 2) we have to cope with having more files open than allowed, so we have a layer that limits the number of OS level FDs open at the same time. We don't want to fsync whenever we have to juggle open fds though, as that'd be too costly. So it'd good to have a way to *reliably* know when writeback io failed, so we can abort a checkpoint if necessary, and instead perform journal replay. For our purposes we'd probably want errors on the fs/superblock level, rather than block devices. It's not always easy to map between blockdevs and relevant filesystems, there are errors above the block layer, and we definitely don'tt want to crash & restart a database just because somebody pulled an USB storage device that didn't have any of the database's data on it. An earlier version of this patchset had some support for that, albeit perhaps not fully implemented (no errors raised, afaict?): https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/commit/?h=notifications&id=338eec77a0cb29a7d2ae9350066c1990408ae58e Is the plan to pick this up again once the basic feature is in? A few notes from the email referenced above (that actually seem to belong into this thread more than the other: 1) From the angle of reliably needing to detect writeback errors, I find it somewhat concerning that an LSM may end up entirely filtering away error notifications, without a consumer being able to detect that: +void __post_watch_notification(struct watch_list *wlist, + struct watch_notification *n, + const struct cred *cred, + u64 id) +{ ... + if (security_post_notification(watch->cred, cred, n) < 0) + continue; It's an unpleasant thought that an overly restrictive [-ly configured] LSM could lead to silently swallowing data integrity errors. 2) It'd be good if there were documentation, aimed at userland consumers of this, explaining what the delivery guarantees are. To be useful for us, it needs to be guaranteed that consuming all notifications ensures that there are no pending notifications queued up somewhere (so we can do fsync(data); fsync(journal); check_for_errors(); durable_rename(checkpoint_state.tmp, checkpoint_state);). 3) What will the permission model for accessing the notifications be? It seems currently anyone, even within a container/namespace or something, will see blockdev errors from everywhere? The earlier superblock support (I'm not sure I like that name btw, hard to understand for us userspace folks), seems to have required exec permission, but nothing else. Greetings, Andres Freund