Re: Something like RAID0 with Ceph

Janne Johansson <icepic.dz@xxxxxxxxx> · Tue, 19 Nov 2024 07:41:44 +0100

Den tis 19 nov. 2024 kl 03:15 skrev Christoph Pleger
<Christoph.Pleger@xxxxxxxxxxxxxxxxx>:
> Hello,
> Is it possible to have something like RAID0 with Ceph?
> That is, when the cluster configuration file contains
>
> osd pool default size = 4

This means all data is replicated 4 times, in your case, one piece per
OSD, which also in your case, means one piece per host.

You will at most be able to fit s MByte into this pool, since it will eat
4 times this size in raw disk for its 4 copies.

> and I have four hosts with one osd drive (all the same size, let's call
> it size s) per host, is it somehow possible to add four other hosts
> with one osd drive (again with size s) per host, so that the resulting
> Ceph block device is of size 2 * s?

You seen to use the term "ceph block device" in an odd way. The common
use of the word means "an RBD image that a ceph client mounts", and the
size of that will not change if you add more hosts. It will be allowed to grow
if more hosts appear, since the pool can now take larger images, or more of
them but the size stays fixed.

If you mean "the pool on which my images reside" instead, then the answer
is "if you add 4 more hosts to your existing 4, then you can use twice
the amount of storage".

I'm not sure which kind of confusion is here, but just let me state a few
things about ceph in the hope of making your view clearer about how it
works:

1. the "size" if the pool only controls the number of copies each object
in it will have, it does not control a number of MB/GB/TB. All pools grow
as you put objects in them, until storage runs out (it will stop before 95%
but still..)

2. The pool has a number of PGs, set at creation but editable later on,
and these also do not control the size of the pool, only how it spreads on
the various OSDs and hosts. One should aim for something like 100-200
PGs per OSD, so in a 4-OSD/4-host case like your example, and with
size=4, the pool should have 128 PGs, which means 128 * 4 (for size) ends
up on 4 OSDs. If/when you add 4 more OSDs, bump the pool to 256 PGs.

3. RAID0 is basically about letting data stretch onto several drives. This is
how ceph (and many other storage clusters) work by default. There is no
settings you have to tune or figure out for it to allow you to use new disks.
You may later want to prevent this, for instance if you want to run one pool
on spindrives and other pools on ssd/nvme, then you would actively config
it to not use whatever disks are added.

4. If we are talking about RBD images, like the ones used for openstack or
proxmox VMs with ceph block storage, then those are internally split up
into lots and lots of pieces by librbd, so when you ask for say a 40G drive
for your VM, you are actually getting lots and lots of 4M (or 2M?) pieces,
that in total sums up to 40G.
Each of these pieces end up on a pseudo-randomly chosen OSD,
meaning that your thousands of pieces spread onto the 128 PGs
in a mostly very even way. This is sort-of acting a bit like raid0/jbod in
some fashion, if you squint your eyes a bit. The important part is that
when your VM reads or writes to the whole of its 40G block device, it
will involve ALL the OSD drives, which is how you want a storage cluster
to work.

-- 
May the most significant bit of your life be positive.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx