On Tue, May 21, 2019 at 6:10 AM Ryan Leimenstoll <rleimens@xxxxxxxxxxxxxx> wrote: > > Hi all, > > We recently encountered an issue where our CephFS filesystem unexpectedly was set to read-only. When we look at some of the logs from the daemons I can see the following: > > On the MDS: > ... > 2019-05-18 16:34:24.341 7fb3bd610700 -1 mds.0.89098 unhandled write error (90) Message too long, force readonly... > 2019-05-18 16:34:24.341 7fb3bd610700 1 mds.0.cache force file system read-only > 2019-05-18 16:34:24.341 7fb3bd610700 0 log_channel(cluster) log [WRN] : force file system read-only > 2019-05-18 16:34:41.289 7fb3c0616700 1 heartbeat_map is_healthy 'MDSRank' had timed out after 15 > 2019-05-18 16:34:41.289 7fb3c0616700 0 mds.beacon.objmds00 Skipping beacon heartbeat to monitors (last acked 4.00101s ago); MDS internal heartbeat is not healthy! > ... > > On one of the OSDs it was most likely targeting: > ... > 2019-05-18 16:34:24.140 7f8134e6c700 -1 osd.602 pg_epoch: 682796 pg[49.20b( v 682796'15706523 (682693'15703449,682796'15706523] local-lis/les=673041/673042 n=10524 ec=245563/245563 lis/c 673041/673041 les/c/f 673042/673042/0 673038/673041/668565) [602,530,558] r=0 lpr=673041 crt=682796'15706523 lcod 682796'15706522 mlcod 682796'15706522 active+clean] do_op msg data len 95146005 > osd_max_write_size 94371840 on osd_op(mds.0.89098:48609421 49.20b 49:d0630e4c:::mds0_sessionmap:head [omap-set-header,omap-set-vals] snapc 0=[] ondisk+write+known_if_redirected+full_force e682796) v8 > 2019-05-18 17:10:33.695 7f813466b700 0 log_channel(cluster) log [DBG] : 49.31c scrub starts > 2019-05-18 17:10:34.980 7f813466b700 0 log_channel(cluster) log [DBG] : 49.31c scrub ok > 2019-05-18 22:17:37.320 7f8134e6c700 -1 osd.602 pg_epoch: 683434 pg[49.20b( v 682861'15706526 (682693'15703449,682861'15706526] local-lis/les=673041/673042 n=10525 ec=245563/245563 lis/c 673041/673041 les/c/f 673042/673042/0 673038/673041/668565) [602,530,558] r=0 lpr=673041 crt=682861'15706526 lcod 682859'15706525 mlcod 682859'15706525 active+clean] do_op msg data len 95903764 > osd_max_write_size 94371840 on osd_op(mds.0.91565:357877 49.20b 49:d0630e4c:::mds0_sessionmap:head [omap-set-header,omap-set-vals,omap-rm-keys] snapc 0=[] ondisk+write+known_if_redirected+full_force e683434) v8 > … > > During this time there were some health concerns with the cluster. Significantly, since the error above seems to be related to the SessionMap, we had a client that had a few blocked requests for over 35948 secs (it’s a member of a compute cluster so we let the node drain/finish jobs before rebooting). We have also had some issues with certain OSDs running older hardware staying up/responding timely to heartbeats after upgrading to Nautilus, although that seems to be an iowait/load issue that we are actively working to mitigate separately. > This prevent mds from trimming completed requests recorded in session. which results a very large session item. To recovery, blacklist the client that has blocked request, the restart mds. > We are running Nautilus 14.2.1 on RHEL7.6. There is only one MDS Rank, with an active/standby setup between two MDS nodes. MDS clients are mounted using the RHEL7.6 kernel driver. > > My read here would be that the MDS is sending too large a message to the OSD, however my understanding was that the MDS should be using osd_max_write_size to determine the size of that message [0]. Is this maybe a bug in how this is calculated on the MDS side? > > > Thanks! > Ryan Leimenstoll > rleimens@xxxxxxxxxxxxxx > University of Maryland Institute for Advanced Computer Studies > > > > [0] https://www.spinics.net/lists/ceph-devel/msg11951.html > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com