Hello, I would like to ask about osd_scrub_max_preemptions in 14.2.20 for large OSDs (mine are 12TB) and/or large k+m EC pools (mine are 8+2). I did search the archives for this list, but I did not see any reference. Symptoms: I have been seeing a behavior in my cluster over the past 2 or 3 weeks where, for no apparent reason, there are suddenly slow ops, followed by a brief OSD down, massive but brief degradation/activating/peering, and then back to normal. I had thought this might have to do with some backfill activity due to a recently failed (as in down and out and process wouldn't start), but now all of that is over and the cluster is mostly back to HEALTH_OK. Thinking this might be something that was introduced between 14.2.9 and 14.2.16, I upgraded to 14.2.20 this morning. However, I just saw the same kind of event happen twice again. At the time, the only non-client activity was a single deep-scrub. Question: The description for osd_scrub_max_preemptions indicates that a deep scrub process will allow itself to be preempted a fixed number of times by client I/O and will then block client I/O until it finishes. Although I don't fully understand the deep scrub process, it seems that either the size of the HDD or the k+m count of the EC Pool could affect the time needed to complete a deep scrub and thus increase the likelihood that more than the default 5 preemptions will occur. Please tell me if my understanding is correct. If so, is there any guideline for increasing osd_scrub_max_preemptions just enough balance between scrub progress and client responsiveness? Or perhaps there are other scrub attributes that should be tuned instead? Thanks. -Dave -- Dave Hall Binghamton University kdhall@xxxxxxxxxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx