On Thu, Jun 20, 2019 at 12:00 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > On Thu, 2019-06-20 at 11:51 -0700, Patrick Donnelly wrote: > > On Thu, Jun 20, 2019 at 11:18 AM Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > > On Thu, 2019-06-20 at 10:19 -0700, Patrick Donnelly wrote: > > > > On Mon, Jun 17, 2019 at 1:45 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > > > > > Again, I'd like to see SIGLOST sent to the application here. Are there > > > > > > any objections to reviving whatever patchset was in flight to add > > > > > > that? Or just writeup a new one? > > > > > > > > > > > > > > > > I think SIGLOST's utility is somewhat questionable. Applications will > > > > > need to be custom-written to handle it. If you're going to do that, then > > > > > it may be better to consider other async notification mechanisms. > > > > > inotify or fanotify, perhaps? Those may be simpler to implement and get > > > > > merged. > > > > > > > > The utility of SIGLOST is not well understood from the viewpoint of a > > > > local file system. The problem uniquely applies to distributed file > > > > systems. There simply is no way to recover from a lost lock for an > > > > application through POSIX mechanisms. We really need a new signal to > > > > just kill the application (by default) because recovery cannot be > > > > automatically performed even through system call errors. I don't see > > > > how inotify/fanotify (not POSIX interfaces!) helps here as it assumes > > > > the application will even use those system calls to monitor for lost > > > > locks when POSIX has no provision for that to happen. > > > > > > > > > > (cc'ing Anna in case she has an opinion) > > > > > > SIGLOST isn't defined in POSIX either, so I'm not sure that argument > > > carries much weight. The _only_ way you'd be able to add SIGLOST is if > > > it defaults to SIG_IGN, such that only applications that are watching > > > for it will react to it. Given that, you're already looking at code > > > modifications. > > > > Why does the default need to be SIG_IGN? Is that existing convention > > for new signals in Linux? > > > > No, it's just that if you don't do that, and locks are lost, then you'll > have a bunch of applications suddenly crash. That sounds scary. > > > > So, the real question is: what's the best method to watch for lost > > > locks? I don't have a terribly strong opinion about any of these > > > notification methods, tbh. I only suggested inotify/fanotify because > > > they'd likely be much simpler to implement. > > > > > > Adding signals is a non-trivial affair as we're running out of bits in > > > that space. The lower 32 bits are all taken and the upper ones are > > > reserved for realtime signals. My signal.h has a commented out SIGLOST: > > > > > > #define SIGIO 29 > > > #define SIGPOLL SIGIO > > > /* > > > #define SIGLOST 29 > > > */ > > > > > > Sharing the value with SIGIO/SIGPOLL makes it distinctly less useful. I > > > think it'd probably need its own value. Maybe there is some way to have > > > the application ask that one of the realtime signals be set up for this? > > > > Well, SIGPOLL is on its way out according to the standards. So SIGIO > > looks like what Linux uses instead. Looking at the man page for > > signal.h, I wonder if we could use SIGIO with si_code==POLL_LOST (a > > new code); si_band==POLL_MSG; and si_fd==<locked fd>. Then the inotify > > interface could then be used to process the event? > > > > The one nit here is that we would be generating SIGIO for regular > > files (and directories?) which would be new. It makes sense with what > > we want to do though. Also, SIGIO default behavior is to terminate the > > process. > > > > That sounds like it could have unintended side-effects. A systemwide > event that causes a ton of userland processes to suddenly catch a fatal > signal seems rather nasty. To be clear: that's only if the mount is configured in the most conservative way. Killing only userland processes with file locks would be an intermediate option (and maybe a default). > It's also not clear to me how you'll identify recipients for this > signal. What tasks will get a SIGLOST when this occurs? Going from file > descriptors or inodes to tasks that are associated with them is not > straightforward. We could start with file locks (which do have owners?) and table the idea of killing all tasks that have any kind of file descriptor open. -- Patrick Donnelly, Ph.D. He / Him / His Senior Software Engineer Red Hat Sunnyvale, CA GPG: 19F28A586F808C2402351B93C3301A3E258DD79D