Re: DB/WALL and RGW index on the same NVME

Lukasz Borek <lukasz@xxxxxxxxxxxx> · Mon, 8 Apr 2024 12:01:55 +0200

Thanks for clarifying.

So redhat doc
<https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/2/html-single/ceph_object_gateway_for_production/index#adv-rgw-hardware-bucket-index>
is outdated?

3.6. Selecting SSDs for Bucket Indexes

When selecting OSD hardware for use with a Ceph Object Gateway—irrespective
> of the use case—Red Hat recommends considering an OSD node that has at
> least one SSD drive used exclusively for the bucket index pool. This is
> particularly important when buckets will contain a large number of objects.

A bucket index entry is approximately 200 bytes of data, stored as an
> object map (omap) in leveldb. While this is a trivial amount of data, some
> uses of Ceph Object Gateway can result in tens or hundreds of millions of
> objects in a single bucket. By mapping the bucket index pool to a CRUSH
> hierarchy of SSD nodes, the reduced latency provides a dramatic performance
> improvement when buckets contain very large numbers of objects.

> Important
> In a production cluster, a typical OSD node will have at least one SSD for
> the bucket index, AND at least on SSD for the journal.

Current utilisation is what osd df command shows in OMAP field?:

root@cephbackup:/# ceph osd df
> ID  CLASS  WEIGHT    REWEIGHT  SIZE     RAW USE   DATA     OMAP     META
>   AVAIL    %USE   VAR   PGS  STATUS
>  0    hdd   7.39870   1.00000  7.4 TiB   894 GiB  769 GiB  1.5 MiB  3.4
> GiB  6.5 TiB  11.80  1.45   40      up
>  1    hdd   7.39870   1.00000  7.4 TiB   703 GiB  578 GiB  6.0 MiB  2.9
> GiB  6.7 TiB   9.27  1.14   37      up
>  2    hdd   7.39870   1.00000  7.4 TiB   700 GiB  576 GiB  3.1 MiB  3.1
> GiB  6.7 TiB   9.24  1.13   39      up

On Mon, 8 Apr 2024 at 08:42, Daniel Parkes <dparkes@xxxxxxxxxx> wrote:

> Hi Lukasz,
>
> RGW uses Omap objects for the index pool; Omaps are stored in Rocksdb
> database of each osd, not on the actual index pool, so by putting DB/WALL
> on an NVMe as you mentioned, you are already configuring the index pool on
> a non-rotational drive, you don't need to do anything else.
>
> You just need to size your DB/WALL partition accordingly. For RGW/object
> storage, a good starting point for the DB/Wall sizing is 4%.
>
> Example of Omap entries in the index pool using 0 bytes, as they are
> stored in Rocksdb:
>
> # rados -p default.rgw.buckets.index listomapkeys .dir.7fb0a3df-9553-4a76-938d-d23711e67677.34162.1.2
> file1
> file2
> file4
> file10
>
> rados df -p default.rgw.buckets.index
> POOL_NAME                  USED  OBJECTS  CLONES  COPIES  MISSING_ON_PRIMARY  UNFOUND  DEGRADED  RD_OPS       RD  WR_OPS      WR  USED COMPR  UNDER COMPR
> default.rgw.buckets.index   0 B       11       0      33                   0        0         0     208  207 KiB      41  20 KiB         0 B          0 B
>
> # rados -p default.rgw.buckets.index stat .dir.7fb0a3df-9553-4a76-938d-d23711e67677.34162.1.2
> default.rgw.buckets.index/.dir.7fb0a3df-9553-4a76-938d-d23711e67677.34162.1.2 mtime 2022-12-20T07:32:11.000000-0500, size 0
>
>
> On Sun, Apr 7, 2024 at 10:06 PM Lukasz Borek <lukasz@xxxxxxxxxxxx> wrote:
>
>> Hi!
>>
>> I'm working on a POC cluster setup dedicated to backup app writing objects
>> via s3 (large objects, up to 1TB transferred via multipart upload
>> process).
>>
>> Initial setup is 18 storage nodes (12HDDs + 1 NVME card for DB/WALL) + EC
>> pool.  Plan is to use cephadm.
>>
>> I'd like to follow good practice and put the RGW index pool on a
>> no-rotation drive. Question is how to do it?
>>
>>    - replace a few HDDs (1 per node) with a SSD (how many? 4-6-8?)
>>    - reserve space on NVME drive on each node, create lv based OSD and let
>>    rgb index use the same NVME drive as DB/WALL
>>
>> Thoughts?
>>
>> --
>> Lukasz
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>
>>

-- 
Łukasz Borek
lukasz@xxxxxxxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx