Re: Question if WAL/block.db partition will benefit us

Boris Behrens <bb@xxxxxxxxx> · Fri, 12 Nov 2021 09:34:54 +0100

Oh wow, a lot to read piled up in one night :)

First things first: I want to thank you all for your insights and for the
really valuable knowledge I pulled from this mailthread.

Regarding flash only:
We use flash only clusters for our RBD clusters. This is very nice and most
of the maintanance is "install updates and reboot". In the last two years I
work for in this company I saw one disk die (rotating) out of around 700
disks in total over eleven cluster (very different in sizes, two of them
hold the vast majority of disks, one is the main s3 cluster).
I would love to use flash only for our s3, but it is expensive.

I think our most "maintanance cost" is the actuall RGW and figuring out how
to use it correctly instead of disk management.

why we use 8tb disks:
I don't really know. We've some older 4TB disks in there (some got
re-added, when we ran very fast into disk limitations, we just needed space
right NOW and those were laying around), some 16TB disks and as we found
out that these disk make some of the hosts really heavy (we've bought and
added them in the "need much space ASAP" situation and didn't have time to
place them optimal in the cluster) and they are expensive, so we are now
going with 8TB.
We are also redistributing disks in the cluster, as there are 2RU chassis
with 7 disks (currently 4TB and 16TB) and 4RU chassis with >20 disks (with
4,8 and 16TB disks). Also CRUSH seems to have a bad time to distribute PGs
around in the cluster when then disks are too scattered. There are a lot of
OSDs lignering around 79.9% - 80% used disk space and a lot at the other
end 65% - 70% used disk space. There are very few that are actually used in
the 73% range (which is the overall cluster utilization).

DWPD anyone?
I don't really care about this, because our s3 cluster is more a data dump
than an actual object storage. 90% of the data are exported rbd snapshots
(we've implemented our managed backup center and needed some place to store
the data, and s3 was really nice because it makes it very easy on the
clients end and can be used from everywhere). It looks like that the
cluster got a lot of writing to it (at least 3/4 of the traffic is writing
to the cluster) but it is only 200-500MB/s. I couldn't imagine that I am
hitting ant DWPD threashold.

why I came up with this question:
We just have a ton of 2TB SSDs laying around (I don't know the number but I
think it is around 30) because we replace small disks with larger disk
instead of just buying a new chassis with disks.
And I wanted to bring these disks to good use in our s3 cluster. Reading
all the problems that might arise from DB/WAL SSDs let's me think to not
just use them. Just add more 8TB disks to the 4RU chassis, and when they
are out of slots add another 8RU chassis with 8TB disks.
Basically I am just cleaning up old technical debts and emergency
decissions and want to do it now in the most optimal way with the resources
we have. :)

Cheers
 Boris
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx