Hi Everyone, I had an issue last night when I was bringing online some osds that I was rebuilding. When the osds created and came online 15pgs got stuck in activating. The first osd (osd.112) seemed to come online ok, but the second one (osd.113) triggered the issue. All the pgs in activating included osd.112 in the pg map and I resolved it by doing pg-upmap-items to map the pg back from osd.112 to where it currently was but it was painful having 10min of stuck i/o os an rbd pool with vm's running. Some details about the cluster: Pacific 16.2.15, upgraded from Nautilus fairly recently and Luminos back in the past. All osds were rebuilt on bluestore in Nautilus, as were the mons. The disks in question are Intel DC P4510 8TB nvme. I'm rebuilding them as I had previously had 4x2TB osd's per disk and now wanted to consolidate down to one osd per disk. There's around 300 osd's in the pool with 16384 pgs which means that the 2TB osds had 157pgs on them. However this means that the 8TB osds have 615pgs on them and I'm wondering if this is maybe the cause of the problem. There are no warnings about too many pgs per osd in the logs or ceph status. I have the default value of 250 for mon_max_pg_per_osd and default value of 3.0 for osd_max_pg_per_osd_hard_ratio. My plan is to reduce the number of pgs in the pool but I want to understand and prove what happened here. Is it likely I've hit pg overdose protection? If I have, how would I tell as I can't see anything in the cluster logs. Thanks, Rich _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx