On Tue, Jul 30, 2019 at 11:20 AM Jerry Lee <leisurelysw24@xxxxxxxxx> wrote: > > Hello Ilya, > > On Mon, 29 Jul 2019 at 16:42, Ilya Dryomov <idryomov@xxxxxxxxx> wrote: > > > > On Fri, Jul 26, 2019 at 11:23 AM Jerry Lee <leisurelysw24@xxxxxxxxx> wrote: > > > > > > Some additional information are provided as below: > > > > > > I tried to restart the active MDS, and after the standby MDS took > > > over, there is no client session recorded in the output of `ceph > > > daemon mds.xxx session ls`. When I restarted the OSD.13 daemon, the > > > stuck write op finished immediately. Thanks. > > > > So it happened again with the same OSD? Did you see this with other > > OSDs? > > Yes. The issue always happened on the same OSD from previous > experience. However, it did happen with other OSD on other node from > the Cephfs kernel client's point of view. Hi Jerry, I'm not sure what you mean by "it did happen with other OSD on other node from the Cephfs kernel client's point of view". > > > > > Try enabling some logging on osd.13 since this seems to be a recurring > > issue. At least "debug ms = 1" so we can see whether it ever sends the > > reply to the original op (i.e. prior to restart). > > Get it, I will raise the debug level to retrive more logs for further > investigateion. > > > > > Also, take note of the epoch in osdc output: > > > > 36 osd13 ... e327 ... > > > > Does "ceph osd dump" show the same epoch when things are stuck? > > > > Unfortunately, the environment was gone. But from the logs captured > before, the epoch seems to be consistent between client and ceph > cluster when thing are stuck, right? > > 2019-07-26 12:24:08.475 7f06efebc700 0 log_channel(cluster) log [DBG] > : osdmap e306: 15 total, 15 up, 15 in > > BTW, logs of OSD.13 and dynamic debug kernel logs of libceph captured > on the stuck node are provided in > https://drive.google.com/drive/folders/1gYksDbCecisWtP05HEoSxevDK8sywKv6?usp=sharing. The libceph log confirms that it had two laggy requests but it ends before you restarted the OSD. The OSD log is useless: we really need to see individual ops coming in and replies going out. It appears that the OSD simply dropped those ops expecting the kernel to resend them but that didn't happen for some reason. Thanks, Ilya