Re: Degraded data redundancy: NUM pgs undersized

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Lothar,

Thanks for your reply.

Am 04.09.2018 um 11:20 schrieb Lothar Gesslein:
By pure chance 15 pgs are now actually replicated to all 3 osds, so they
have enough copies (clean). But the placement is "wrong", it would like
to move the data to different osds (remapped) if possible.

That seems to be correct. I've added a third bucket of type datacenter and moved on host bucket so that each datacenter has one host with one osd. The PGs were rebalanced (if that is the correct term) and status changed to HEALTH_OK with all PGs active+clean.

Now I moved the host in dc2 to another datacenter and removed dc2 from the CRUSH map. Now I have all PGs active+clean+remapped. So now your next statement applies:

It replicated to 2 osds in the initial placement but wasn't able to find
a suitable third osd. Then by increasing pgp_num it recalculated the
placement, again selected two osds and moved the data there. It won't
remove the data from the "wrong" osd until it has a new place for it, so
you end up with three copies, but remapped pgs.

Ok, I think I got this.


  3. What's wrong here and what do I have to do to get the cluster back
to active+clean, again?

I guess you want to have "two copies in dc1, one copy in dc2"?

If you stay with only 3 osds that is the only way to distribute 3
objects anyway, so you don't need any crush rule.

What your crush rule is currently expressing is

"in the default root, select n buckets (where n is the pool size, 3 in
this case) of type datacenter, select one leaf (meaning osd) in each
datacenter". You only have 2 datacenter buckets, so that will only ever
select 2 osds.


If your cluster is going to grow to at least 2 osds in each dc, you can
go with

http://cephnotes.ksperis.com/blog/2017/01/23/crushmap-for-2-dc/

I would translate this crush rule as

"in the default root, select 2 buckets of type datacenter, select n-1
(where n is the pool size, so here 3-1 = 2) leafs in each datacenter"

You will need at least two osds in each dc for this, because it is
random (with respect to the weights) in which dc the 2 copies will be
placed and which gets the remaining copy.

I don't get it why I need to have at least two osds in each dc. Because I thought when I only have three osds it is implicit clear where to write the two copies.

In case I have two osds in each dc I would never know on which side the two copies of my three replicas are.

Let's try an example to check if my understanding of the matter is correct or not:

I have two dc dcA and dcB with two osds in each dc. Due to the random placement two copies of object A are written in dcA and one in dcB. From the next object B two copies are written in dcB and one in dcA.

In case I have two osds in dcA and only one in dcB the two copies of an object are written to dcA every time and only one copy in dcB.

Did I get it right?

Best regards,
Joerg


Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux