On Tue, 2019-06-04 at 11:37 +0200, Ilya Dryomov wrote: > On Mon, Jun 3, 2019 at 11:05 PM Patrick Donnelly <pdonnell@xxxxxxxxxx> wrote: > > On Mon, Jun 3, 2019 at 1:24 PM Gregory Farnum <gfarnum@xxxxxxxxxx> wrote: > > > On Mon, Jun 3, 2019 at 1:07 PM Ilya Dryomov <idryomov@xxxxxxxxx> wrote: > > > > Can we also discuss how useful is allowing to recover a mount after it > > > > has been blacklisted? After we fail everything with EIO and throw out > > > > all dirty state, how many applications would continue working without > > > > some kind of restart? And if you are restarting your application, why > > > > not get a new mount? > > > > > > > > IOW what is the use case for introducing a new debugfs knob that isn't > > > > that much different from umount+mount? > > > > > > People don't like it when their filesystem refuses to umount, which is > > > what happens when the kernel client can't reconnect to the MDS right > > > now. I'm not sure there's a practical way to deal with that besides > > > some kind of computer admin intervention. > > > > Furthermore, there are often many applications using the mount (even > > with containers) and it's not a sustainable position that any > > network/client/cephfs hiccup requires a remount. Also, an application > > Well, it's not just any hiccup. It's one that lead to blacklisting... > > > that fails because of EIO is easy to deal with a layer above but a > > remount usually requires grump admin intervention. > > I feel like I'm missing something here. Would figuring out $ID, > obtaining root and echoing to /sys/kernel/debug/$ID/control make the > admin less grumpy, especially when containers are involved? > > Doing the force_reconnect thing would retain the mount point, but how > much use would it be? Would using existing (i.e. pre-blacklist) file > descriptors be allowed? I assumed it wouldn't be (permanent EIO or > something of that sort), so maybe that is the piece I'm missing... > I agree with Ilya here. I don't see how applications can just pick up where they left off after being blacklisted. Remounting in some fashion is really the only recourse here. To be clear, what happens to stateful objects (open files, byte-range locks, etc.) in this scenario? Were you planning to just re-open files and re-request locks that you held before being blacklisted? If so, that sounds like a great way to cause some silent data corruption... -- Jeff Layton <jlayton@xxxxxxxxxx>