Splitting PG's is one of the most intensive and disruptive things you can, and should, do to a cluster. Tweaking recovery sleep, max backfills, and heartbeat grace should help with this. Heartbeat grace can be set high enough to mitigate the OSDs flapping which slows things down by peering and additional recovery, while still being able to detect OSDs that might fail and go down. The recovery sleep and max backfills are the settings you want to look at for mitigating slow requests. I generally tweak those while watching iostat of some OSDs and ceph -s to make sure I'm not giving too much priority to the recovery operations so that client IO can still happen.
On Mon, Feb 26, 2018 at 11:10 AM David C <dcsysengineer@xxxxxxxxx> wrote:
_______________________________________________Followed by:I'm seeing some OSDs getting marked down, it appears to be related to PG splitting, e.g:Hi AllI have a 12.2.1 cluster, all filestore OSDs, OSDs are spinners, journals on NVME. Cluster primarily used for CephFS, ~20M objects.2018-02-26 10:27:27.935489 7f140dbe2700 1 _created [C,D] has 5121 objects, starting split.2018-02-26 10:27:58.242551 7f141cc3f700 0 log_channel(cluster) log [WRN] : 9 slow requests, 5 included below; oldest blocked for > 30.308128 secs
2018-02-26 10:27:58.242563 7f141cc3f700 0 log_channel(cluster) log [WRN] : slow request 30.151105 seconds old, received at 2018-02-26 10:27:28.091312: osd_op(mds.0.5339:811969 3.5c 3:3bb9d743:::200.0018c6c4:head [write 73416~5897 [fadvise_dontneed]] snapc 0=[] ondisk+write+known_if_redirected+full_force e13994) currently commit_sent
2018-02-26 10:27:58.242569 7f141cc3f700 0 log_channel(cluster) log [WRN] : slow request 30.133441 seconds old, received at 2018-02-26 10:27:28.108976: osd_op(mds.0.5339:811970 3.5c 3:3bb9d743:::200.0018c6c4:head [write 79313~4866 [fadvise_dontneed]] snapc 0=[] ondisk+write+known_if_redirected+full_force e13994) currently commit_sent
2018-02-26 10:27:58.242574 7f141cc3f700 0 log_channel(cluster) log [WRN] : slow request 30.083401 seconds old, received at 2018-02-26 10:27:28.159016: osd_op(mds.9174516.0:444202 3.5c 3:3bb9d743:::200.0018c6c4:head [stat] snapc 0=[] ondisk+read+rwordered+known_if_redirected+full_force e13994) currently waiting for rw locks
2018-02-26 10:27:58.242579 7f141cc3f700 0 log_channel(cluster) log [WRN] : slow request 30.072310 seconds old, received at 2018-02-26 10:27:28.170107: osd_op(mds.0.5339:811971 3.5c 3:3bb9d743:::200.0018c6c4:head [write 84179~1941 [fadvise_dontneed]] snapc 0=[] ondisk+write+known_if_redirected+full_force e13994) currently waiting for rw locks
2018-02-26 10:27:58.242584 7f141cc3f700 0 log_channel(cluster) log [WRN] : slow request 30.308128 seconds old, received at 2018-02-26 10:27:27.934288: osd_op(mds.0.5339:811964 3.5c 3:3bb9d743:::200.0018c6c4:head [write 0~62535 [fadvise_dontneed]] snapc 0=[] ondisk+write+known_if_redirected+full_force e13994) currently commit_sent
2018-02-26 10:27:59.242768 7f141cc3f700 0 log_channel(cluster) log [WRN] : 47 slow requests, 5 included below; oldest blocked for > 31.308410 secs
2018-02-26 10:27:59.242776 7f141cc3f700 0 log_channel(cluster) log [WRN] : slow request 30.349575 seconds old, received at 2018-02-26 10:27:28.893124:I'm also experiencing some MDS crash issues which I think could be related.Is there anything I can do to mitigate the slow requests problem? The rest of the time the cluster is performing pretty well.Thanks,David
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com