Re: Balancer in HEALTH_ERR

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Great – what does ceph health detail show ? I’m guessing you most likely need to remove the OSDs from CEPH006 (As well as CEPH006) to get Ceph to move the data where it needs to be. OSD removal process is here: https://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/

 

Eric

 

From: EDH - Manuel Rios Fernandez <mriosfer@xxxxxxxxxxxxxxxx>
Sent: Thursday, August 1, 2019 4:04 PM
To: Smith, Eric <Eric.Smith@xxxxxxxx>; ceph-users@xxxxxxxxxxxxxx
Subject: RE: [ceph-users] Balancer in HEALTH_ERR

 

Hi Eric,

 

CEPH006 is the node that we’re evacuating , for that task we added CEPH005.

 

Thanks

 

De: Smith, Eric <Eric.Smith@xxxxxxxx>
Enviado el: jueves, 1 de agosto de 2019 20:12
Para: EDH - Manuel Rios Fernandez <mriosfer@xxxxxxxxxxxxxxxx>; ceph-users@xxxxxxxxxxxxxx
Asunto: Re: [ceph-users] Balancer in HEALTH_ERR

 

From your pastebin data – it appears you need to change the crush weight of the OSDs on CEPH006? They all have crush weight of 0, when other OSDs seem to have a crush weight of 10.91309. You might look into the ceph osd crush reweight-subtree command.

 

Eric

 

From: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> on behalf of EDH - Manuel Rios Fernandez <mriosfer@xxxxxxxxxxxxxxxx>
Date: Thursday, August 1, 2019 at 1:52 PM
To: "ceph-users@xxxxxxxxxxxxxx" <ceph-users@xxxxxxxxxxxxxx>
Subject: [ceph-users] Balancer in HEALTH_ERR

 

Hi ,

 

Two weeks ago, we started a data migration from one old ceph node to a new one.

For task we added a 120TB Host to the cluster and evacuated the old one with the ceph osd crush reweight osd.X 0.0 that move near 15 TB per day.

 

After 1 week and few days we found that balancer module don’t work fine under this situacion it don’t distribute data between OSD if cluster is not HEALTH status.

 

The current situation , some osd are at 96% and others at 75% , causing some pools get very nearfull 99%.

 

I read several post about balancer only works in HEALHTY mode and that’s the problem, because ceph don’t distribute the data equal between OSD in native mode, causing in the scenario of “Evacuate+Add” huge problems.

 

Info: https://pastebin.com/HuEt5Ukn

 

Right now for solve we are manually change weight of most used osd.

 

Anyone more got this problem?

 

Regards

 

Manuel

 

 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux