Hi, Ilya The kernel version is 3.10.106. Part of dmesg related to ceph: [7349718.004905] libceph: osd297 down [7349718.005190] libceph: osd299 down [7349785.671015] libceph: osd295 down [7350006.357509] libceph: osd291 weight 0x0 (out) [7350006.357795] libceph: osd292 weight 0x0 (out) [7350006.358075] libceph: osd293 weight 0x0 (out) [7350006.358356] libceph: osd294 weight 0x0 (out) [7350013.312399] libceph: osd289 weight 0x0 (out) [7350013.312683] libceph: osd290 weight 0x0 (out) [7350013.312964] libceph: osd296 weight 0x0 (out) [7350013.313244] libceph: osd298 weight 0x0 (out) [7350023.322571] libceph: osd288 weight 0x0 (out) [7350038.338217] libceph: osd297 weight 0x0 (out) [7350038.338501] libceph: osd299 weight 0x0 (out) [7350115.364496] libceph: osd295 weight 0x0 (out) [7350179.683200] libceph: osd294 weight 0x10000 (in) [7350179.683495] libceph: osd294 up [7350193.654197] libceph: osd293 weight 0x10000 (in) [7350193.654486] libceph: osd297 weight 0x10000 (in) [7350193.654769] libceph: osd293 up [7350193.655046] libceph: osd297 up [7350228.750112] libceph: osd299 weight 0x10000 (in) [7350228.750399] libceph: osd299 up [7350255.739415] libceph: osd289 weight 0x10000 (in) [7350255.739700] libceph: osd289 up [7350268.578031] libceph: osd288 weight 0x10000 (in) [7350268.578315] libceph: osd288 up [7383411.866068] libceph: osd299 down [7383558.405675] libceph: osd299 up [7383411.866068] libceph: osd299 down [7383558.405675] libceph: osd299 up [7387106.574308] libceph: osd291 weight 0x10000 (in) [7387106.574593] libceph: osd291 up [7387124.168198] libceph: osd296 weight 0x10000 (in) [7387124.168492] libceph: osd296 up [7387131.732934] libceph: osd292 weight 0x10000 (in) [7387131.733218] libceph: osd292 up [7387131.741277] libceph: osd290 weight 0x10000 (in) [7387131.741558] libceph: osd290 up [7387149.788781] libceph: osd298 weight 0x10000 (in) [7387149.789066] libceph: osd298 up A node of osds restart some days before. And after evict session: [7679890.147116] libceph: mds0 x.x.x.x:6800 socket closed (con state OPEN) [7679890.491439] libceph: mds0 x.x.x.x:6800 connection reset [7679890.491727] libceph: reset on mds0 [7679890.492006] ceph: mds0 closed our session [7679890.492286] ceph: mds0 reconnect start [7679910.479911] ceph: mds0 caps stale [7679927.886621] ceph: mds0 reconnect denied We have to restart the machine to recovery it. I will send you an email if it happen again. Thanks for your reply. -----邮件原件----- 发件人: Ilya Dryomov [mailto:idryomov@xxxxxxxxx] 发送时间: 2017年11月13日 17:30 收件人: 周 威 <choury@xxxxxx> 抄送: ceph-users@xxxxxxxxxxxxxx 主题: Re: 答复: Where can I find the fix commit of #3370 ? On Mon, Nov 13, 2017 at 10:18 AM, 周 威 <choury@xxxxxx> wrote: > Hi, Ilya > > I'm using the kernel of centos 7, should be 3.10 I checked the patch, > and it is appears in my kernel source. > We got the same stack of #3370, the process is hung in sleep_on_page_killable. > The debugs/ceph/osdc shows there is a read request are waiting response, while the command `ceph daemon osd.x ops` shows nothing. > Evict the session from mds does not help. > The version of ceph cluster is 10.2.9. I don't think it's related to that ticket. Which version of centos 7? Can you provide dmesg? Is it reproducible? A debug ms = 1 log for that OSD would help with narrowing this down. Thanks, Ilya _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com