PG upmap corner cases that silently fail

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi cephers,

I've been looking into better balancing our clusters with upmaps lately, and ran into upmap cases that behave in a less than ideal way.  If there is any cycle in the upmaps like

ceph osd pg-upmap-items <pgid> a b b a
or
ceph osd pg-upmap-items <pgid> a b b c c a

the upmap validation passes, the upmap gets added to the osdmap, but then gets silently ignored.  Obviously this is for EC pools - irrelevant for replicated pools where the order of OSDs is not significant.
The relevant code OSDMap::_apply_upmap even has a comment about this:

  if (q != pg_upmap_items.end()) {
    // NOTE: this approach does not allow a bidirectional swap,
    // e.g., [[1,2],[2,1]] applied to [0,1,2] -> [0,2,1].
    for (auto& r : q->second) {
      // make sure the replacement value doesn't already appear
  ...

I'm trying to understand the reasons for this limitation: is it the case that this is just a matter of convenience of coding (OSDMap::_apply_upmap could do this correctly with a bit more careful approach), or is there some inherent limitation somewhere else that prevents these cases from working?  I did notice that just updating crush weights (without using upmaps) produces similar changes to the UP set (swaps OSDs in EC pools sometimes), so the OSDs seem to be perfectly capable of doing backfills for osdmap changes that shuffle the order of OSDs in the UP set.  Some insight/history here would be appreciated.

Either way, the behavior of validation passing on an upmap and then the upmap getting silently ignored is not ideal.  I do realize that all clients would have to agree on this code, since clients independently execute it to find the OSDs to access (so rolling out a change to this is challenging).

Andras
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx




[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux