Re: cephfs kernel client umount stucks forever

Jerry Lee <leisurelysw24@xxxxxxxxx> · Wed, 31 Jul 2019 09:49:43 +0800

On Tue, 30 Jul 2019 at 23:02, Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
>
> On Tue, Jul 30, 2019 at 11:20 AM Jerry Lee <leisurelysw24@xxxxxxxxx> wrote:
> >
> > Hello Ilya,
> >
> > On Mon, 29 Jul 2019 at 16:42, Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
> > >
> > > On Fri, Jul 26, 2019 at 11:23 AM Jerry Lee <leisurelysw24@xxxxxxxxx> wrote:
> > > >
> > > > Some additional information are provided as below:
> > > >
> > > > I tried to restart the active MDS, and after the standby MDS took
> > > > over, there is no client session recorded in the output of `ceph
> > > > daemon mds.xxx session ls`.  When I restarted the OSD.13 daemon, the
> > > > stuck write op finished immediately.  Thanks.
> > >
> > > So it happened again with the same OSD?  Did you see this with other
> > > OSDs?
> >
> > Yes.  The issue always happened on the same OSD from previous
> > experience.  However, it did happen with other OSD on other node from
> > the Cephfs kernel client's point of view.
>
> Hi Jerry,
>
> I'm not sure what you mean by "it did happen with other OSD on other
> node from the Cephfs kernel client's point of view".
>

Hi Ilya,

Sorry, it simply means that I had only seen OSD on Node2 and Node3
shown in the osdc debug output when encountering the issue but I
didn't seen stuck write op sent to OSD on Node1.  So, in the
beginning, I think that there might be some network connection issues
among nodes.

Node1 (where the kernel client umount stuck)
   - OSD.0
   - OSD.1
   - ...
Node2
   - OSD.5
   - OSD.6
   - ...
Node3
   - OSD.10
   - OSD.11
   - ...

> >
> > >
> > > Try enabling some logging on osd.13 since this seems to be a recurring
> > > issue.  At least "debug ms = 1" so we can see whether it ever sends the
> > > reply to the original op (i.e. prior to restart).
> >
> > Get it, I will raise the debug level to retrive more logs for further
> > investigateion.
> >
> > >
> > > Also, take note of the epoch in osdc output:
> > >
> > > 36      osd13   ... e327 ...
> > >
> > > Does "ceph osd dump" show the same epoch when things are stuck?
> > >
> >
> > Unfortunately, the environment was gone.  But from the logs captured
> > before, the epoch seems to be consistent between client and ceph
> > cluster when thing are stuck, right?
> >
> > 2019-07-26 12:24:08.475 7f06efebc700  0 log_channel(cluster) log [DBG]
> > : osdmap e306: 15 total, 15 up, 15 in
> >
> > BTW, logs of OSD.13 and dynamic debug kernel logs of libceph captured
> > on the stuck node are provided in
> > https://drive.google.com/drive/folders/1gYksDbCecisWtP05HEoSxevDK8sywKv6?usp=sharing.
>
> The libceph log confirms that it had two laggy requests but it ends
> before you restarted the OSD.  The OSD log is useless: we really need
> to see individual ops coming in and replies going out.  It appears that
> the OSD simply dropped those ops expecting the kernel to resend them
> but that didn't happen for some reason.

Thanks for the analysis.  I will raise the debug level and hope more
clues can be capatured.

- Jerry

>
> Thanks,
>
>                 Ilya