Hi folks, I am fighting a bit with odd deep-scrub behavior on HDDs and discovered a likely cause of why the distribution of last_deep_scrub_stamps is so weird. I wrote a small script to extract a histogram of scrubs by "days not scrubbed" (more precisely, intervals not scrubbed; see code) to find out how (deep-) scrub times are distributed. Output below. What I expected is along the lines that HDD-OSDs try to scrub every 1-3 days, while they try to deep-scrub every 7-14 days. In other words, OSDs that have been deep-scrubbed within the last 7 days would *never* be in scrubbing+deep state. However, what I see is completely different. There seems to be no distinction between scrub- and deep-scrub start times. This is really unexpected as nobody would try to deep-scrub HDDs every day. Weekly to bi-weekly is normal, specifically for large drives. Is there a way to configure something like osd_deep_scrub_min_interval (no, I don't want to run cron jobs for scrubbing yet)? In the output below, I would like to be able to configure a minimum period of 1-2 weeks before the next deep-scrub happens. How can I do that? The observed behavior is very unusual for RAID systems (if its not a bug in the report script). With this behavior its not surprising that people complain about "not deep-scrubbed in time" messages and too high deep-scrub IO load when such a large percentage of OSDs is needlessly deep-scrubbed after 1-6 days again already. Sample output: # scrub-report dumped pgs Scrub report: 4121 PGs not scrubbed since 1 intervals (6h) 3831 PGs not scrubbed since 2 intervals (6h) 4012 PGs not scrubbed since 3 intervals (6h) 3986 PGs not scrubbed since 4 intervals (6h) 2998 PGs not scrubbed since 5 intervals (6h) 1488 PGs not scrubbed since 6 intervals (6h) 909 PGs not scrubbed since 7 intervals (6h) 771 PGs not scrubbed since 8 intervals (6h) 582 PGs not scrubbed since 9 intervals (6h) 2 scrubbing 431 PGs not scrubbed since 10 intervals (6h) 333 PGs not scrubbed since 11 intervals (6h) 1 scrubbing 265 PGs not scrubbed since 12 intervals (6h) 195 PGs not scrubbed since 13 intervals (6h) 116 PGs not scrubbed since 14 intervals (6h) 78 PGs not scrubbed since 15 intervals (6h) 1 scrubbing 72 PGs not scrubbed since 16 intervals (6h) 37 PGs not scrubbed since 17 intervals (6h) 5 PGs not scrubbed since 18 intervals (6h) 14.237* 19.5cd* 19.12cc* 19.1233* 14.40e* 33 PGs not scrubbed since 20 intervals (6h) 23 PGs not scrubbed since 21 intervals (6h) 16 PGs not scrubbed since 22 intervals (6h) 12 PGs not scrubbed since 23 intervals (6h) 8 PGs not scrubbed since 24 intervals (6h) 2 PGs not scrubbed since 25 intervals (6h) 19.eef* 19.bb3* 4 PGs not scrubbed since 26 intervals (6h) 19.b4c* 19.10b8* 19.f13* 14.1ed* 5 PGs not scrubbed since 27 intervals (6h) 19.43f* 19.231* 19.1dbe* 19.1788* 19.16c0* 6 PGs not scrubbed since 28 intervals (6h) 2 PGs not scrubbed since 30 intervals (6h) 19.10f6* 14.9d* 3 PGs not scrubbed since 31 intervals (6h) 19.1322* 19.1318* 8.a* 1 PGs not scrubbed since 32 intervals (6h) 19.133f* 1 PGs not scrubbed since 33 intervals (6h) 19.1103* 3 PGs not scrubbed since 36 intervals (6h) 19.19cc* 19.12f4* 19.248* 1 PGs not scrubbed since 39 intervals (6h) 19.1984* 1 PGs not scrubbed since 41 intervals (6h) 14.449* 1 PGs not scrubbed since 44 intervals (6h) 19.179f* Deep-scrub report: 3723 PGs not deep-scrubbed since 1 intervals (24h) 4621 PGs not deep-scrubbed since 2 intervals (24h) 8 scrubbing+deep 3588 PGs not deep-scrubbed since 3 intervals (24h) 8 scrubbing+deep 2929 PGs not deep-scrubbed since 4 intervals (24h) 3 scrubbing+deep 1705 PGs not deep-scrubbed since 5 intervals (24h) 4 scrubbing+deep 1904 PGs not deep-scrubbed since 6 intervals (24h) 5 scrubbing+deep 1540 PGs not deep-scrubbed since 7 intervals (24h) 7 scrubbing+deep 1304 PGs not deep-scrubbed since 8 intervals (24h) 7 scrubbing+deep 923 PGs not deep-scrubbed since 9 intervals (24h) 5 scrubbing+deep 557 PGs not deep-scrubbed since 10 intervals (24h) 7 scrubbing+deep 501 PGs not deep-scrubbed since 11 intervals (24h) 2 scrubbing+deep 363 PGs not deep-scrubbed since 12 intervals (24h) 2 scrubbing+deep 377 PGs not deep-scrubbed since 13 intervals (24h) 1 scrubbing+deep 383 PGs not deep-scrubbed since 14 intervals (24h) 2 scrubbing+deep 252 PGs not deep-scrubbed since 15 intervals (24h) 2 scrubbing+deep 116 PGs not deep-scrubbed since 16 intervals (24h) 5 scrubbing+deep 47 PGs not deep-scrubbed since 17 intervals (24h) 2 scrubbing+deep 10 PGs not deep-scrubbed since 18 intervals (24h) 2 PGs not deep-scrubbed since 19 intervals (24h) 19.1c6c* 19.a01* 1 PGs not deep-scrubbed since 20 intervals (24h) 14.1ed* 2 PGs not deep-scrubbed since 21 intervals (24h) 19.1322* 19.10f6* 1 PGs not deep-scrubbed since 23 intervals (24h) 19.19cc* 1 PGs not deep-scrubbed since 24 intervals (24h) 19.179f* PGs marked with a * are on busy OSDs and not eligible for scrubbing. The script (pasted here because attaching doesn't work): # cat bin/scrub-report #!/bin/bash # Compute last scrub interval count. Scrub interval 6h, deep-scrub interval 24h. # Print how many PGs have not been (deep-)scrubbed since #intervals. ceph -f json pg dump pgs 2>&1 > /root/.cache/ceph/pgs_dump.json echo "" T0="$(date +%s)" scrub_info="$(jq --arg T0 "$T0" -rc '.pg_stats[] | [ .pgid, (.last_scrub_stamp[:19]+"Z" | (($T0|tonumber) - fromdateiso8601)/(60*60*6)|ceil), (.last_deep_scrub_stamp[:19]+"Z" | (($T0|tonumber) - fromdateiso8601)/(60*60*24)|ceil), .state, (.acting | join(" ")) ] | @tsv ' /root/.cache/ceph/pgs_dump.json)" # less <<<"$scrub_info" # 1 2 3 4 5..NF # pg_id scrub-ints deep-scrub-ints status acting[] awk <<<"$scrub_info" '{ for(i=5; i<=NF; ++i) pg_osds[$1]=pg_osds[$1] " " $i if($4 == "active+clean") { si_mx=si_mx<$2 ? $2 : si_mx dsi_mx=dsi_mx<$3 ? $3 : dsi_mx pg_sn[$2]++ pg_sn_ids[$2]=pg_sn_ids[$2] " " $1 pg_dsn[$3]++ pg_dsn_ids[$3]=pg_dsn_ids[$3] " " $1 } else if($4 ~ /scrubbing\+deep/) { deep_scrubbing[$3]++ for(i=5; i<=NF; ++i) osd[$i]="busy" } else if($4 ~ /scrubbing/) { scrubbing[$2]++ for(i=5; i<=NF; ++i) osd[$i]="busy" } else { unclean[$2]++ unclean_d[$3]++ si_mx=si_mx<$2 ? $2 : si_mx dsi_mx=dsi_mx<$3 ? $3 : dsi_mx pg_sn[$2]++ pg_sn_ids[$2]=pg_sn_ids[$2] " " $1 pg_dsn[$3]++ pg_dsn_ids[$3]=pg_dsn_ids[$3] " " $1 for(i=5; i<=NF; ++i) osd[$i]="busy" } } END { print "Scrub report:" for(si=1; si<=si_mx; ++si) { if(pg_sn[si]==0 && scrubbing[si]==0 && unclean[si]==0) continue; printf("%7d PGs not scrubbed since %2d intervals (6h)", pg_sn[si], si) if(scrubbing[si]) printf(" %d scrubbing", scrubbing[si]) if(unclean[si]) printf(" %d unclean", unclean[si]) if(pg_sn[si]<=5) { split(pg_sn_ids[si], pgs) osds_busy=0 for(pg in pgs) { split(pg_osds[pgs[pg]], osds) for(o in osds) if(osd[osds[o]]=="busy") osds_busy=1 if(osds_busy) printf(" %s*", pgs[pg]) if(!osds_busy) printf(" %s", pgs[pg]) } } printf("\n") } print "" print "Deep-scrub report:" for(dsi=1; dsi<=dsi_mx; ++dsi) { if(pg_dsn[dsi]==0 && deep_scrubbing[dsi]==0 && unclean_d[dsi]==0) continue; printf("%7d PGs not deep-scrubbed since %2d intervals (24h)", pg_dsn[dsi], dsi) if(deep_scrubbing[dsi]) printf(" %d scrubbing+deep", deep_scrubbing[dsi]) if(unclean_d[dsi]) printf(" %d unclean", unclean_d[dsi]) if(pg_dsn[dsi]<=5) { split(pg_dsn_ids[dsi], pgs) osds_busy=0 for(pg in pgs) { split(pg_osds[pgs[pg]], osds) for(o in osds) if(osd[osds[o]]=="busy") osds_busy=1 if(osds_busy) printf(" %s*", pgs[pg]) if(!osds_busy) printf(" %s", pgs[pg]) } } printf("\n") } print "" print "PGs marked with a * are on busy OSDs and not eligible for scrubbing." } ' Don't forget the last "'" when copy-pasting. Thanks for any pointers. ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx