Re: CephFS msg length greater than osd_max_write_size

"Yan, Zheng" <ukernel@xxxxxxxxx> · Wed, 22 May 2019 16:20:03 +0800

On Tue, May 21, 2019 at 6:10 AM Ryan Leimenstoll
<rleimens@xxxxxxxxxxxxxx> wrote:
>
> Hi all,
>
> We recently encountered an issue where our CephFS filesystem unexpectedly was set to read-only. When we look at some of the logs from the daemons I can see the following:
>
> On the MDS:
> ...
> 2019-05-18 16:34:24.341 7fb3bd610700 -1 mds.0.89098 unhandled write error (90) Message too long, force readonly...
> 2019-05-18 16:34:24.341 7fb3bd610700  1 mds.0.cache force file system read-only
> 2019-05-18 16:34:24.341 7fb3bd610700  0 log_channel(cluster) log [WRN] : force file system read-only
> 2019-05-18 16:34:41.289 7fb3c0616700  1 heartbeat_map is_healthy 'MDSRank' had timed out after 15
> 2019-05-18 16:34:41.289 7fb3c0616700  0 mds.beacon.objmds00 Skipping beacon heartbeat to monitors (last acked 4.00101s ago); MDS internal heartbeat is not healthy!
> ...
>
> On one of the OSDs it was most likely targeting:
> ...
> 2019-05-18 16:34:24.140 7f8134e6c700 -1 osd.602 pg_epoch: 682796 pg[49.20b( v 682796'15706523 (682693'15703449,682796'15706523] local-lis/les=673041/673042 n=10524 ec=245563/245563 lis/c 673041/673041 les/c/f 673042/673042/0 673038/673041/668565) [602,530,558] r=0 lpr=673041 crt=682796'15706523 lcod 682796'15706522 mlcod 682796'15706522 active+clean] do_op msg data len 95146005 > osd_max_write_size 94371840 on osd_op(mds.0.89098:48609421 49.20b 49:d0630e4c:::mds0_sessionmap:head [omap-set-header,omap-set-vals] snapc 0=[] ondisk+write+known_if_redirected+full_force e682796) v8
> 2019-05-18 17:10:33.695 7f813466b700  0 log_channel(cluster) log [DBG] : 49.31c scrub starts
> 2019-05-18 17:10:34.980 7f813466b700  0 log_channel(cluster) log [DBG] : 49.31c scrub ok
> 2019-05-18 22:17:37.320 7f8134e6c700 -1 osd.602 pg_epoch: 683434 pg[49.20b( v 682861'15706526 (682693'15703449,682861'15706526] local-lis/les=673041/673042 n=10525 ec=245563/245563 lis/c 673041/673041 les/c/f 673042/673042/0 673038/673041/668565) [602,530,558] r=0 lpr=673041 crt=682861'15706526 lcod 682859'15706525 mlcod 682859'15706525 active+clean] do_op msg data len 95903764 > osd_max_write_size 94371840 on osd_op(mds.0.91565:357877 49.20b 49:d0630e4c:::mds0_sessionmap:head [omap-set-header,omap-set-vals,omap-rm-keys] snapc 0=[] ondisk+write+known_if_redirected+full_force e683434) v8
> …
>
> During this time there were some health concerns with the cluster. Significantly, since the error above seems to be related to the SessionMap, we had a client that had a few blocked requests for over 35948 secs (it’s a member of a compute cluster so we let the node drain/finish jobs before rebooting). We have also had some issues with certain OSDs running older hardware staying up/responding timely to heartbeats after upgrading to Nautilus, although that seems to be an iowait/load issue that we are actively working to mitigate separately.
>

This prevent mds from trimming completed requests recorded in session.
which results a very large session item.  To recovery, blacklist the
client that has blocked request, the restart mds.

> We are running Nautilus 14.2.1 on RHEL7.6. There is only one MDS Rank, with an active/standby setup between two MDS nodes. MDS clients are mounted using the RHEL7.6 kernel driver.
>
> My read here would be that the MDS is sending too large a message to the OSD, however my understanding was that the MDS should be using osd_max_write_size to determine the size of that message [0]. Is this maybe a bug in how this is calculated on the MDS side?
>
>
> Thanks!
> Ryan Leimenstoll
> rleimens@xxxxxxxxxxxxxx
> University of Maryland Institute for Advanced Computer Studies
>
>
>
> [0] https://www.spinics.net/lists/ceph-devel/msg11951.html
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com