Re: lost osd while migrating EC pool to device-class crush rules

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 09/14/2018 02:38 PM, Gregory Farnum wrote:
On Thu, Sep 13, 2018 at 3:05 PM, Graham Allan <gta@xxxxxxx> wrote:

However I do see transfer errors fetching some files out of radosgw - the
transfer just hangs then aborts. I'd guess this probably due to one pg stuck
down, due to a lost (failed HDD) osd. I think there is no alternative to
declare the osd lost, but I wish I understood better the implications of the
"recovery_state" and "past_intervals" output by ceph pg query:
https://pastebin.com/8WrYLwVt

What are you curious about here? The past intervals is listing the
OSDs which were involved in the PG since it was last clean, then each
acting set and the intervals it was active for.

That's pretty much what I'm looking for, and that the pg can roll back to an earlier interval if there were no writes, and the current osd has to be declared lost.

I find it disturbing/odd that the acting set of osds lists only 3/6
available; implies that without getting one of these back it would be
impossible to recover the data (from 4+2 EC). However the dead osd 98 only
appears in the most recent (?) interval - presumably during the flapping
period, during which time client writes were unlikely (radosgw disabled).

So if 98 were marked lost would it roll back to the prior interval? I am not
certain how to interpret this information!

Yes, that’s what should happen if it’s all as you outline here.

It *is* quite curious that the PG apparently went active with only 4
members in a 4+2 system — it's supposed to require at least k+1 (here,
5) by default. Did you override the min_size or something?
-Greg

Looking back through history it seems that I *did* override the min_size for this pool, however I didn't reduce it - it used to have min_size 2! That made no sense to me - I think it must be an artifact of a very early (hammer?) ec pool creation, but it pre-dates me.

I found the documentation on what min_size should be a bit confusing which is how I arrived at 4. Fully agree that k+1=5 makes way more sense.

I don't think I was the only one confused by this though, eg
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-May/026445.html

I suppose the safest thing to do is update min_size->5 right away to force any size-4 pgs down until they can perform recovery. I can set force-recovery on these as well...

Is there any setting which can permit these pgs to fulfil reads while refusing writes when active size=k?


--
Graham Allan
Minnesota Supercomputing Institute - gta@xxxxxxx
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux