lagging peering wq

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Faidon/paravoid's cluster has a bunch of OSDs that are up, but the pg 
queries indicate they are tens of thousands of epochs behind:

      "history": { "epoch_created": 14,
          "last_epoch_started": 88174,
          "last_epoch_clean": 88174,
          "last_epoch_split": 0,
          "same_up_since": 88172,
          "same_interval_since": 88172,
          "same_primary_since": 88172,

(where the current map epoch is 102000 or thereabouts).

I think just restarting all OSDs at once will get him caught up (esp with 
a 'ceph osd set noup' block until they are done processing maps), but I 
wonder if we may want an additional check that if any PG falls more than X 
epochs behind the OSD marks it self down and catches up before coming 
in...

What do you think?

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux