On Fri, Feb 02, 2024 at 04:23:46PM +0000, Al Viro wrote: > On Fri, Feb 02, 2024 at 11:22:15AM +0000, David Howells wrote: > > Miklos Szeredi <miklos@xxxxxxxxxx> wrote: > > > > > Just making inode_lock() interruptible would break everything. > > > > Why? Obviously, you'd need to check the result of the inode_lock(), which I > > didn't put in my very rough example code, but why would taking the lock at the > > front of a vfs op like mkdir be a problem? > > Plenty of new failure exits to maintain? I don't currently see a reason to go around converting existing uninterruptible sleeps; the main benefit of the proposal as I see it would be that we could mark sleeps as either interruptible or killable correctly, since that really depends on what syscall we're in and what userspace is expecting. If kernel code can correctly do one it can do both, so this is a pretty straightforward change. But it is an interesting idea, I'd be curious to see what comes out of playing around with some refactorings. There's some other wait_event() related ideas kicking around too... Willy and Dave and I were talking about the "asynchronous waits" that io_uring is wanting to do - I believe this is currently just done in an ad-hoc way for waiting on a folio lock. It seemed like it might be possible to do this in a more generic way by simply dynamically allocating the waitlist entry, and signalling via task_struct the wait/wakeup should be delivered to a kiocb, instead of to a thread. Another thing I've been wanting to do is embed a sequence number in wait_queue_head_t, which would be incremented on wakeup. This would change prepare_to_wait() to "read current sequence number", then later we sleep until the sequence number has changed from what we initially read. This would let us fix double expansion of the wait condition in the wait_event() macros, and it would also mean we're not flipping task state before running the cond expression...