On Tue, Sep 23, 2014 at 6:20 AM, Florian Haas <florian@xxxxxxxxxxx> wrote: > On Mon, Sep 22, 2014 at 7:06 PM, Florian Haas <florian@xxxxxxxxxxx> wrote: >> On Sun, Sep 21, 2014 at 9:52 PM, Sage Weil <sweil@xxxxxxxxxx> wrote: >>> On Sun, 21 Sep 2014, Florian Haas wrote: >>>> So yes, I think your patch absolutely still has merit, as would any >>>> means of reducing the number of snapshots an OSD will trim in one go. >>>> As it is, the situation looks really really bad, specifically >>>> considering that RBD and RADOS are meant to be super rock solid, as >>>> opposed to say CephFS which is in an experimental state. And contrary >>>> to CephFS snapshots, I can't recall any documentation saying that RBD >>>> snapshots will break your system. >>> >>> Yeah, it sounds like a separate issue, and no, the limit is not >>> documented because it's definitely not the intended behavior. :) >>> >>> ...and I see you already have a log attached to #9503. Will take a look. >> >> I've already updated that issue in Redmine, but for the list archives >> I should also add this here: Dan's patch for #9503, together with >> Sage's for #9487, makes the problem go away in an instant. I've >> already pointed out that I owe Dan dinner, and Sage, well I already >> owe Sage pretty much lifelong full board. :) > > Looks like I was bit too eager: while the cluster is behaving nicely > with these patches while nothing happens to any OSDs, it does flag PGs > as incomplete when an OSD goes down. Once the mon osd down out > interval expires things seem to recover/backfill normally, but it's > still disturbing to see this in the interim. > > I've updated http://tracker.ceph.com/issues/9503 with a pg query from > one of the affected PGs, within the mon osd down out interval, while > it was marked incomplete. > > Dan or Sage, any ideas as to what might be causing this? That *looks* like it's just because the pool has both size and min_size set to 2? -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html