Re: What is the meaning of size and min_size for erasure-coded pools?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, May 8, 2018 at 2:16 PM Maciej Puzio <mkp37215@xxxxxxxxx> wrote:
Thank you everyone for your replies. However, I feel that at least
part of the discussion deviated from the topic of my original post. As
I wrote before, I am dealing with a toy cluster, whose purpose is not
to provide a resilient storage, but to evaluate ceph and its behavior
in the event of a failure, with particular attention paid to
worst-case scenarios. This cluster is purposely minimal, and is built
on VMs running on my workstation, all OSDs storing data on a single
SSD. That's definitely not a production system.

I am not asking for advice on how to build resilient clusters, not at
this point. I asked some questions about specific things that I
noticed during my tests, and that I was not able to find explained in
ceph documentation. Dan van der Ster wrote:
> See https://github.com/ceph/ceph/pull/8008 for the reason why min_size defaults to k+1 on ec pools.
That's a good point, but I am wondering why are reads also blocked
when number of OSDs falls down to k? What if total number of OSDs in a
pool (n) is larger than k+m, should the min_size then be k(+1) or
n-m(+1)?
In any case, since min_size can be easily changed, then I guess this
is not an implementation issue, but rather a documentation issue.

Which leaves these my questions still unanswered:
After killing m OSDs and setting min_size=k most of PGs were now
active+undersized, often with ...+degraded and/or remapped, but a few
were active+clean or active+clean+remapped. Why? I would expect all
PGs to be in the same state (perhaps active+undersized+degraded?).
Is this mishmash of PG states normal? If not, would I have avoided it
if I created the pool with min_size=k=3 from the start? In other
words, does min_size influence the assignment of PGs to OSDs? Or is it
only used to force I/O shutdown in the event of OSDs failures?

active+clean does not make a lot of sense if every PG really was 3+2. But perhaps you had a 3x replicated pool or something hanging out as well from your deployment tool?
The active+clean+remapped means that a PG was somehow lucky enough to have an existing "stray" copy on one of the OSDs that it has decided to use to bring it back up to the right number of copies, even though they certainly won't match the proper failure domains.
The min_size in relation to the k+m values won't have any direct impact here, although they might indirectly affect it by changing how quickly stray PGs get deleted.
-Greg
 

Thank you very much

Maciej Puzio
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux