Re: Need some advice about Pools and Erasure Coding

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On 4/29/19 11:19 AM, Rainer Krienke wrote:
I am planning to set up a ceph cluster and already implemented a test
cluster where we are going to use RBD images for data storage (9 hosts,
each host has 16 OSDs, each OSD 4TB).
We would like to use erasure coded (EC)  pools here, and so all OSD are
bluestore. Since several projects are going to store data on this ceph
cluster I think it would make sense to use several EC coded pools for
separation of the projects and access control.

Now I have some questions I hope someone can help me with:

- Do I still (nautilus) need two pools for EC based RBD images, one EC
data pool and a second replicated pool for metadatata?
AFAIK the EC pools cannot store metadata at all, so you probably still need a separate replicated pool.

- If I do need two pools for RBD images and I want to separate the data of
different projects by using different pools with EC coding then how
should I handle the metadata pool which contains probably only a small
amount of data compared to the data pool?  Does it make sense to have
*one* replicated metadata pool (eg the default rbd pool) for all
projects and one EC pool for each project, or would it be better to
create one replicated and one EC pool for each project?

An alternative concept is using rados namespaces; each project uses its own namespace in a single replicated pool. Whether this works in your setup depends on the clients and whether they support namespaces.

On the other hand the PG autotuning in nautilus can keep the number of PGs low, so additional replicated pools won't be as bad as they were pre-nautilus.


- I also thought about the different k+m settings for a EC pool, for
example k=4, m=2 compared to k=8 and m=2. Both settings allow for two
OSDs to fail without any data loss, but I asked myself which of the two
settings would be more performant? On one hand distributing data to more
OSDs allows a higher parallel access to the data, that should result in
a faster access. On the other hand each OSD has a latency until
it can deliver its data shard. So is there a recommandation which of my
two k+m examples should be preferred?

I cannot comment on speed (interesting question, since we are about to setup a new cluster, too)...but I won't use k=8,m=2 in a setup with 9 hosts only. You should have at least k+m+m hosts to handle hosts failures gracefully. So with nine hosts even k=6,m=2 might (and will) be a problem.


Regards,

Burkhard


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux