On Mon, Jun 3, 2019 at 11:05 PM Patrick Donnelly <pdonnell@xxxxxxxxxx> wrote: > > On Mon, Jun 3, 2019 at 1:24 PM Gregory Farnum <gfarnum@xxxxxxxxxx> wrote: > > > > On Mon, Jun 3, 2019 at 1:07 PM Ilya Dryomov <idryomov@xxxxxxxxx> wrote: > > > Can we also discuss how useful is allowing to recover a mount after it > > > has been blacklisted? After we fail everything with EIO and throw out > > > all dirty state, how many applications would continue working without > > > some kind of restart? And if you are restarting your application, why > > > not get a new mount? > > > > > > IOW what is the use case for introducing a new debugfs knob that isn't > > > that much different from umount+mount? > > > > People don't like it when their filesystem refuses to umount, which is > > what happens when the kernel client can't reconnect to the MDS right > > now. I'm not sure there's a practical way to deal with that besides > > some kind of computer admin intervention. > > Furthermore, there are often many applications using the mount (even > with containers) and it's not a sustainable position that any > network/client/cephfs hiccup requires a remount. Also, an application Well, it's not just any hiccup. It's one that lead to blacklisting... > that fails because of EIO is easy to deal with a layer above but a > remount usually requires grump admin intervention. I feel like I'm missing something here. Would figuring out $ID, obtaining root and echoing to /sys/kernel/debug/$ID/control make the admin less grumpy, especially when containers are involved? Doing the force_reconnect thing would retain the mount point, but how much use would it be? Would using existing (i.e. pre-blacklist) file descriptors be allowed? I assumed it wouldn't be (permanent EIO or something of that sort), so maybe that is the piece I'm missing... Thanks, Ilya