Hello, I recently added 2 OSD nodes to my Nautilus cluster, increasing the OSD count from 32 to 48 - all 12TB HDDs with NVMe for db. I generally keep an ssh session open where I can run 'watch cepf -s'. My observations are mostly based on what I saw from watching this. Even with 10GB networking, rebalancing 529 pgs took 10 days, during which there were always a few PGs undersized+degraded, frequent flashes of slow ops, occasional OSD restarts, and the scrub and deep-scrub backlog steadily increased. When the backfills completed I had 24 missed deep-scrubs and 10 missed scrubs. I suspect that this is because of some settings that I had fiddled with, so this post may be an advertisement for what not to do to your cluster. However, I'd like to know if my understanding is accurate. I believe that my settings resulted In short, I think I had my config set up so there was contention due to too many processes trying to do things to some OSDs all at once: - osd_scrub_during_recovery: I think I had this set to true for the first 9 days, but set it to false when I started to realize that it might be causing contention - osd_max_scrubs: I had this set high - global:30 osd:10. At some earlier time when I had a scrub backlog I thought that these were counts for simultaneous scrubs across all OSDs rather than 'per OSD' - Now I see why the default is 1. - Assumption: on an HDD multiple competing scrubs cause excessive seeking and thus compound impacts to scrub progress - osd_max_backfills: I had bumped this up as well - global:30 osd:10, thinking it would speed up the rebalancing of my PGs onto my new OSDs. - Now, the same thinking as for osd_max_scrubs: compounding contention, further compounded by the scrub acivity that should have been inhibited by osd_scrub_during_recovery:false. I believe that all of this also resulted in my EC pgs (8 + 2) becoming degraded. My assumption here is that collisions between deep-scrubs and backfills sometimes locked the backfill process out of a piece of an EC PG, causing backfil to rebuild instead of copy. The good news is that I haven't lost and data and, other than the scrub backlog things seem to be working smoothly. It seems like with 1 or 2 scrubs (deep or regular) running they are taking about 2 hours per scrub. As the scrubs progress, more scrub deadlines are missed, so it's not a steady march to zero. Please feel free to comment. I'd be glad to know if I'm on the right track as we expect the cluster to double in size over the next 12 to 18 months. Thanks. -Dave -- Dave Hall Binghamton University kdhall@xxxxxxxxxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx