Re: [PATCH 8/8] ceph: return -EIO if read/write against filp that lost file locks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2019-06-20 at 12:50 -0700, Patrick Donnelly wrote:
> On Thu, Jun 20, 2019 at 12:00 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> > On Thu, 2019-06-20 at 11:51 -0700, Patrick Donnelly wrote:
> > > On Thu, Jun 20, 2019 at 11:18 AM Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> > > > On Thu, 2019-06-20 at 10:19 -0700, Patrick Donnelly wrote:
> > > > > On Mon, Jun 17, 2019 at 1:45 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> > > > > > > Again, I'd like to see SIGLOST sent to the application here. Are there
> > > > > > > any objections to reviving whatever patchset was in flight to add
> > > > > > > that? Or just writeup a new one?
> > > > > > > 
> > > > > > 
> > > > > > I think SIGLOST's utility is somewhat questionable. Applications will
> > > > > > need to be custom-written to handle it. If you're going to do that, then
> > > > > > it may be better to consider other async notification mechanisms.
> > > > > > inotify or fanotify, perhaps? Those may be simpler to implement and get
> > > > > > merged.
> > > > > 
> > > > > The utility of SIGLOST is not well understood from the viewpoint of a
> > > > > local file system. The problem uniquely applies to distributed file
> > > > > systems. There simply is no way to recover from a lost lock for an
> > > > > application through POSIX mechanisms. We really need a new signal to
> > > > > just kill the application (by default) because recovery cannot be
> > > > > automatically performed even through system call errors. I don't see
> > > > > how inotify/fanotify (not POSIX interfaces!) helps here as it assumes
> > > > > the application will even use those system calls to monitor for lost
> > > > > locks when POSIX has no provision for that to happen.
> > > > > 
> > > > 
> > > > (cc'ing Anna in case she has an opinion)
> > > > 
> > > > SIGLOST isn't defined in POSIX either, so I'm not sure that argument
> > > > carries much weight. The _only_ way you'd be able to add SIGLOST is if
> > > > it defaults to SIG_IGN, such that only applications that are watching
> > > > for it will react to it. Given that, you're already looking at code
> > > > modifications.
> > > 
> > > Why does the default need to be SIG_IGN? Is that existing convention
> > > for new signals in Linux?
> > > 
> > 
> > No, it's just that if you don't do that, and locks are lost, then you'll
> > have a bunch of applications suddenly crash. That sounds scary.
> > 
> > > > So, the real question is: what's the best method to watch for lost
> > > > locks? I don't have a terribly strong opinion about any of these
> > > > notification methods, tbh. I only suggested inotify/fanotify because
> > > > they'd likely be much simpler to implement.
> > > > 
> > > > Adding signals is a non-trivial affair as we're running out of bits in
> > > > that space. The lower 32 bits are all taken and the upper ones are
> > > > reserved for realtime signals. My signal.h has a commented out SIGLOST:
> > > > 
> > > > #define SIGIO           29
> > > > #define SIGPOLL         SIGIO
> > > > /*
> > > > #define SIGLOST         29
> > > > */
> > > > 
> > > > Sharing the value with SIGIO/SIGPOLL makes it distinctly less useful. I
> > > > think it'd probably need its own value. Maybe there is some way to have
> > > > the application ask that one of the realtime signals be set up for this?
> > > 
> > > Well, SIGPOLL is on its way out according to the standards. So SIGIO
> > > looks like what Linux uses instead. Looking at the man page for
> > > signal.h, I wonder if we could use SIGIO with si_code==POLL_LOST (a
> > > new code); si_band==POLL_MSG; and si_fd==<locked fd>. Then the inotify
> > > interface could then be used to process the event?
> > > 
> > > The one nit here is that we would be generating SIGIO for regular
> > > files (and directories?) which would be new. It makes sense with what
> > > we want to do though. Also, SIGIO default behavior is to terminate the
> > > process.
> > > 
> > 
> > That sounds like it could have unintended side-effects. A systemwide
> > event that causes a ton of userland processes to suddenly catch a fatal
> > signal seems rather nasty.
> 
> To be clear: that's only if the mount is configured in the most
> conservative way. Killing only userland processes with file locks
> would be an intermediate option (and maybe a default).
> 

A disable switch for this behavior would be a minimum requirement, I
think.

> > It's also not clear to me how you'll identify recipients for this
> > signal. What tasks will get a SIGLOST when this occurs? Going from file
> > descriptors or inodes to tasks that are associated with them is not
> > straightforward.
> 
> We could start with file locks (which do have owners?) and table the
> idea of killing all tasks that have any kind of file descriptor open.

Well...we do track current->tgid for l_pid, so you could probably go by
that for traditional POSIX locks.

For flock() and OFD locks though, the tasks are owned by the file
description and those can be shared between processes. So, even if you
kill the tgid that acquired the lock, there's no guarantee other
processes that might be using that lock will get the signal. Not that
that's a real argument against doing this, but this approach could have
significant gaps. 

OTOH, reporting a lost lock via fanotify would be quite straightforward
(and not even that difficult). Then, any process that really cares could
watch for these events.

Again, I really think I'm missing the big picture on the problem you're
attempting to solve with this.

For instance, I was operating under the assumption that you wanted this
to be an opt-in thing, but it sounds like you're aiming for something
that is more draconian. I'm not convinced that that's a good idea --
especially not initially. Enabling this by default could be a very
unwelcome surprise in some environments.
-- 
Jeff Layton <jlayton@xxxxxxxxxx>




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux