On Mon, Aug 26, 2019 at 03:48:38AM +0100, Al Viro wrote: > We might be able to paper over that mess by doing what /dev/st does - > checking that file_count(file) == 1 in ->flush() instance and doing commit > there in such case. It's not entirely reliable, though, and it's definitely > not something I'd like to see spreading. This "not entirely reliable" turns out to be an understatement. If you have /proc/*/fdinfo/* being read from at the time of final close(2), you'll get file_count(file) > 1 the last time ->flush() is called. In other words, we'd get the data not committed at all. And that problem is shared with /dev/st*, unfortunately ;-/ We could somewhat mitigate that by having fs/proc/fd.c:seq_show() call ->flush() before fput(), but that would still hide errors from close(2) (and still have close(2) return before the data is flushed). read() on /proc/*/fdinfo/* does the following: find the task_struct grab its descriptor table, drop task_struct lock the table, pick struct file out of it bump struct file refcount, unlock the table seq_printf(m, "pos:\t%lli\nflags:\t0%o\nmnt_id:\t%i\n", (long long)file->f_pos, f_flags, real_mount(file->f_path.mnt)->mnt_id); show_fd_locks(m, file, files); if (seq_has_overflowed(m)) goto out; if (file->f_op->show_fdinfo) file->f_op->show_fdinfo(m, file); drop the file reference (with fput()). Before "procfs: Convert /proc/pid/fdinfo/ handling routines to seq-file v2" (in 2012), we did just snprintf() while under the lock on descriptor table. That commit moved the printf part from under the lock, at the cost of grabbing and dropping file reference. Shortly after that "procfs: add ability to plug in auxiliary fdinfo providers" has added ->show_fdinfo() there, making it impossible to call under the descriptor table lock - that method can block (and does so for eventpoll, idiotify, etc.) We really want ->show_fdinfo() to happen before __fput() gets anywhere near ->release(). And even the non-blocking cases can be too costly to do under the descriptor table lock. OTOH, it can very well be done after or during ->flush(); the only problematic case right now is /dev/st* that has its ->flush() do nothing in case if file_count(file) is greater than 1. One kludgy way to handle that would be to have something like FMODE_SUCKY_FLUSH that would have fs/proc/fd.c:seq_show() just do the damn thing still under descriptor table lock and skip the rest of it - /dev/st* has nothing in ->show_fdinfo(), and show_fd_locks() is not too terribly costly. Still best avoided in default case, but... Another possibility is to have a secondary counter, with __fput() waiting for it to go down to zero and fdinfo reads bumping (and then dropping) that instead of the primary counter. Not sure which approach is better - adding extra logics in __fput() for the sake of one (and not terribly common) device is not nice, but another variant really is an ugly kludge ;-/ OTOH, this kind of "take a secondary reference, ->release() will block until you drop it" interface can breed deadlocks; procfs situation, AFAICS, allows to use it safely, but it's begging to be abused... Ideas? I don't like either approach, to put it very mildly, so any cleaner suggestions would be very welcome. PS: just dropping the check in st_flush() is probably a bad idea - as it is, it can't overlap with st_write() and after such change it will...