Anthony is correct, this is what I was getting at as well when seeing your ceph -s output. More details in the Ceph docs here if you want to understand the details of why you need to balance your nodes. https://docs.ceph.com/en/quincy/rados/operations/monitoring-osd-pg/ But you need to get your core rados cluster healthy, then you can pivot to trying to bringing back your gateway services. Sent from Bloomberg Professional for iPhone ----- Original Message ----- From: Anthony D'Atri <anthony.datri@xxxxxxxxx> To: nguyenvandiep@xxxxxxxxxxxxxx CC: ceph-users@xxxxxxx At: 02/24/24 15:46:24 UTC Your recovery is stuck because there are no OSDs that have enough space to accept data. Your second OSD host appears to only have 9 OSDs currently, so you should be able to add a 10TB OSD there without removing anything. That will enable data to move to all three of your 10TB OSDs. > On Feb 24, 2024, at 10:41 AM, nguyenvandiep@xxxxxxxxxxxxxx wrote: > > HolySh*** > > First, we change the mon_max_pg_per_osd to 1000 > > About adding disk for cephosd02, for more detail , what is TO, sir ? I ll make conversation with my boss. To be honest, im thinking that the volume recovery progress will get problem... > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx