Re: Using RBD to pack billions of small files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Le 04/02/2021 à 08:41, Loïc Dachary a écrit :
> Hi Frederico,
>
> On 04/02/2021 05:51, Federico Lucifredi wrote:
>> Hi Loïc,
>>    I am intrigued, but am missing something: why not using RGW, and store the source code files as objects? RGW has native compression and can take care of that behind the scenes.
> Excellent question!
>>    Is the desire to use RBD only due to minimum allocation sizes?
> I *assume* that since RGW does have

If I understand correctly I assume that you are missing a "not" here.

>  specific strategies to take advantage of the fact that objects are immutable and will never be removed:
>
> * It will be slower to add artifacts in RGW than in an RBD image + index
> * The metadata in RGW will be larger than an RBD image + index
>
> However I have not verified this and if you have an opinion I'd love to hear it :-)

Reading the exchanges I believe you are focused on the reading speed and
space efficiency. Did you consider the writing speed with such a scheme ?

Depending on how you store the index, you could block on each write and
would have to consider Ceph latency (ie: if your writer fails recovering
can be tricky without having waited for writes to update your index).
With your 100TB target and 3kb artifact size a 1ms latency and blocking
writes translate to a whole year spent writing. If you manage to get to
a 0.1ms latency (not sure if this is achievable with Ceph yet) you end
with a month. Depending on how you plan to populate the store this could
be a problem. You'll have to consider if the artifact writing rate limit
can become a bottleneck during normal use too.

You can probably design a scheme supporting storing multiple values in a
single write but it seems to add complexity which might come with
unwanted performance problems and space use itself.

I'm not familiar with space efficiency on modern Ceph versions (still
using filestore on Hammer...), do you have a ballpark estimation of the
costs of storing artifacts as simple objects ? Unless you already worked
out the whole design that would be my first concern : it could end up
being an inefficiency worth the trade-off for simplicity.

I'm unfamiliar with the gateway and how well and easily it can scale so
my first impulse was to bypass RGW to use the librados interface
directly. You can definitely begin with a RGW solution as it is a bit
easier to implement and switch to librados later if RGW ever becomes a
bottleneck. If you need speed either writing or reading, both RGW and
librados would work : you can have as many clients managing objects in
parallel without any lock on writes on your end to manage. This is a
very simple storage design and simplicity can't be overrated :-)
The only potential downside (in addition to space inefficiency) that I
can see would be walking the list of objects. This is doable but with
billions of them this could be very slow. Not sure if it could become a
need given your use case though.

For reference, I just found the results of a test with a moderately
comparable test set :
https://www.redhat.com/en/blog/scaling-ceph-billion-objects-and-beyond.
I didn't finish reading it yet but the volume seems comparable to your
use case although with 64kB objects.

Note : I've seen questions about 100TB RBDs in the thread. We use such
beasts in two clusters : they work fine but are a pain when deleting or
downsizing them. During one downsize on the slowest cluster we had to
pause the operation manually (SIGSTOP to the rbd process) during periods
of high loads and let it continue after. This took about a week (but the
cluster was admittedly underpowered for its use at the time).

Best regards,

--
Lionel Bouton
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux