Re: "umount" of ceph filesystem that has become unavailable hangs forever

"Anton V.G." <anton@xxxxxxxxxx> · Fri, 23 Jul 2010 16:42:21 +0500

Did you try an umount -l (lasy umount) - should just 
disconnect the fs - as I experienced with other network FS - 
like NFS or Gluster - you may always have difficulties with 
any of them - so "-l" helps me. Not sure for CEPH though.

On Friday 23 July 2010, Sébastien Paolacci wrote:
> Hello Sage,
> 
> I would like to emphasize that this issue is somewhat
> annoying, even for experiment purpose: I definitely
> expect my test server to not behave safely, crash, burn
> or whatever, but having a client side impact as deep as
> needed a (hard) reboot to solved a hanged ceph really
> prevent me from testing with real life payloads.
> 
> I understand that it's not an easy point but a lot of my
> colleagues are not really whiling to sacrifice even
> their dev workstation to play during spare time... sad
> world ;)
> 
> Sebastien
> 
> On Wed, 16 Jun 2010, Peter Niemayer wrote:
> > Hi,
> > 
> > trying to "umount" a formerly mounted ceph filesystem
> > that has become unavailable (osd crashed, then msd/mon
> > were shut down using /etc/init.d/ceph stop) results in
> > "umount" hanging forever in
> > "D" state.
> > 
> > Strangely, "umount -f" started from another terminal
> > reports the ceph filesystem as not being mounted
> > anymore, which is consistent with what the mount-table
> > says.
> > 
> > The kernel keeps emitting the following messages from 
time to time:
> > > Jun 16 17:25:29 gitega kernel: ceph:  tid 211912
> > > timed out on osd0, will reset osd
> > > Jun 16 17:25:35 gitega kernel: ceph: mon0
> > > 10.166.166.1:6789 connection failed
> > > Jun 16 17:26:15 gitega last message repeated 4 times
> > 
> > I would have expected the "umount" to terminate at
> > least after some generous timeout.
> > 
> > Ceph should probably support something like the
> > "soft,intr" options of NFS, because if the only
> > supported way of mounting is one where a client is
> > more or less stuck-until-reboot when the service
> > fails, many potential test-configurations involving
> > Ceph are way too dangerous to try...
> 
> Yeah, being able to force it to shut down when servers
> are unresponsive is definitely the intent.  'umount -f'
> should work.  It sounds like the problem is related to
> the initial 'umount' (which doesn't time out) followed
> by 'umount -f'.
> 
> I'm hesitant to add a blanket umount timeout, as that
> could prevent proper writeout of cached data/metadata in
> some cases.  So I think the goal should be that if a
> normal umount hangs for some reason, you should be able
> to intervene to add the 'force' if things don't go well.
> 
> sage
> --
> --
> To unsubscribe from this list: send the line "unsubscribe
> ceph-devel" in the body of a message to
> majordomo@xxxxxxxxxxxxxxx More majordomo info at 
> http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html