Re: Ceph Balancer code

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,


On 8/18/19 12:06 AM, EDH - Manuel Rios Fernandez wrote:

 

Hi ,

 

Whats the reason for not allow balancer PG if objects are inactive/misplaced at least in nautilus 14.2.2 ?

https://github.com/ceph/ceph/blob/master/src/pybind/mgr/balancer/module.py#L874


*snipsnap*


 

We can understood that balancer cant work with unknow pgs states and inactive states. But… missing and misplaced…


The degraded state indicates that some data is missing within the pg, or one replicate is not up to date. See below for an example.

 

Hope some developer can clarify that. This lines cause a lot of problem at least in nautilus 14.2.2

 

Case example:

  • Pool Size 1, upgraded to Size 2. Cluster become Warning with misplaced and degraded. Some objects are don’t recovery from degraded state due “OSD backfullfill_toofull “due OSDs became full instead of even distributed and balanced, because balancer code exclude it.

Updating to size 2 requires all PGs to have two replicates. After changing the size settings, the PGs will be undersized+degraded, since only one instance exists (->undersized). The second replicate will be created during the backfilling, and after the complete content of the PG the state will change to active+clean.

Degraded state can also happen during restart of an OSD. If write are not blocked during the restart (e.g. there are enough replicates active), the other instances of a PG will have updated data. This data needs to be replicated to the restarted OSD after it is available again. A similar situation happens during balancing or moving PG in general; if a PG is not transfered completely yet, new writes may be send to either the old set of OSDs (and need to be backfilling afterwards), or send to the new set (and are considered degraded, since from the a point-in-time view of the cluster they are not present on the acting set of OSDs). I'm not 100% sure which way is implemented in Ceph, gut feelings point to the later one.


Degraded thus refers to a state where a PG does not fulfill its replication requirements and should thus be handled as an error or warning state. And you do not want the balancer to interfere with this state.


Regards,

Burkhard


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux