Hi Sage, Losing a message would have been plausible given the network issue we had today. I tried: # ceph osd pg-temp 75.45 6689 set 75.45 pg_temp mapping to [6689] then waited a bit. It's still incomplete -- the only difference is now I see two more past_intervals in the pg. Full query here: http://pastebin.com/TU7vVLpj I didn't have debug_osd above zero when I did that. Should I try again with debug_osd 20? Thanks :) Dan On Fri, Mar 13, 2015 at 12:59 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote: > This looks a bit like a the osds may have lost a message, actually. You can > kick an individual pg to repeer with something like > > ceph osd pg-temp 75.45 6689 > > See if that makes it go? > > sage > > > > On March 13, 2015 7:24:48 AM EDT, Dan van der Ster <dan@xxxxxxxxxxxxxx> > wrote: >> >> On Mon, Mar 9, 2015 at 4:47 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote: >>> >>> On Mon, Mar 9, 2015 at 8:42 AM, Dan van der Ster <dan@xxxxxxxxxxxxxx> >>> wrote: >>>> >>>> Hi Sage, >>>> >>>> On Tue, Feb 10, 2015 at 2:51 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote: >>>>> >>>>> On Mon, 9 Feb 2015, David McBride wrote: >>>>>> >>>>>> On 09/02/15 15:31, Gregory Farnum wrote: >>>>>> >>>>>>> So, memory >>>>>>> usage of an OSD is usually linear in the number of PGs it >>>>>>> hosts. However, that memory can also grow based on at least one >>>>>>> other >>>>>>> thing: the number of OSD Maps required to go through peering. It >>>>>>> *looks* to me like this is what you're running in to, not growth on >>>>>>> the number of state machines. In particular, those past_intervals >>>>>>> you >>>>>>> mentioned. ;) >>>>>> >>>>>> >>>>>> Hi Greg, >>>>>> >>>>>> Right, that sounds entirely plausible, and is very helpful. >>>>>> >>>>>> In practice, that means I'll need to be careful to avoid this >>>>>> situation >>>>>> occurring in production ? but given that's unlikely to occur except >>>>>> in the >>>>>> case of non-trivial neglect, I don't think I need be particularly >>>>>> concerned. >>>>>> >>>>>> (Happily, I'm in the situation that my existing cluster is purely for >>>>>> testing >>>>>> purposes; the data is expendable.) >>>>>> >>>>>> That said, for my own peace of mind, it would be valuable to have a >>>>>> procedure >>>>>> that can be used to recover from this >>>>>> state, even if it's unlikely to occur in >>>>>> practice. >>>>> >>>>> >>>>> The best luck I've had recovering from situations is something like: >>>>> >>>>> - stop all osds >>>>> - osd set nodown >>>>> - osd set nobackfill >>>>> - osd set noup >>>>> - set map cache size smaller to reduce memory footprint. >>>>> >>>>> osd map cache size = 50 >>>>> osd map max advance = 25 >>>>> osd map share max epochs = 25 >>>>> osd pg epoch persisted max stale = 25 >>> >>> >>> It can cause extreme slowness if you get into a failure situation and >>> your OSDs need to calculate past intervals across more maps than will >>> fit in the cache. :( >> >> >> .. extreme slowness or is it also possible to get into a situation >> where the PGs are stuck incomplete forever? >> >> The reason I ask is because we actually had a network issue this >> morning that left OSDs flapping and a lot of osdmap epoch churn. Now >> our network has >> stabilized but 10 PGs are incomplete, even though all >> the OSDs are up. One PG looks like this, for example: >> >> pg 75.45 is stuck inactive for 87351.077529, current state incomplete, >> last acting [6689,1919,2329] >> pg 75.45 is stuck unclean for 87351.096198, current state incomplete, >> last acting [6689,1919,2329] >> pg 75.45 is incomplete, acting [6689,1919,2329] >> >> 1919 3.62000 osd.1919 up >> 1.00000 1.00000 >> 2329 3.62000 osd.2329 up >> 1.00000 1.00000 >> 6689 3.62000 osd.6689 up >> 1.00000 1.00000 >> >> The pg query output here: http://pastebin.com/WyTAU69W >> >> Is that a result of these short map caches or could it be something >> else? (we're running 0.93-76-gc35f422) >> WWGD (what would Greg do?) to activate these PGs? >> >> Thanks! Dan >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html