Thanks for clarifying. So redhat doc <https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/2/html-single/ceph_object_gateway_for_production/index#adv-rgw-hardware-bucket-index> is outdated? 3.6. Selecting SSDs for Bucket Indexes When selecting OSD hardware for use with a Ceph Object Gateway—irrespective > of the use case—Red Hat recommends considering an OSD node that has at > least one SSD drive used exclusively for the bucket index pool. This is > particularly important when buckets will contain a large number of objects. A bucket index entry is approximately 200 bytes of data, stored as an > object map (omap) in leveldb. While this is a trivial amount of data, some > uses of Ceph Object Gateway can result in tens or hundreds of millions of > objects in a single bucket. By mapping the bucket index pool to a CRUSH > hierarchy of SSD nodes, the reduced latency provides a dramatic performance > improvement when buckets contain very large numbers of objects. > Important > In a production cluster, a typical OSD node will have at least one SSD for > the bucket index, AND at least on SSD for the journal. Current utilisation is what osd df command shows in OMAP field?: root@cephbackup:/# ceph osd df > ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META > AVAIL %USE VAR PGS STATUS > 0 hdd 7.39870 1.00000 7.4 TiB 894 GiB 769 GiB 1.5 MiB 3.4 > GiB 6.5 TiB 11.80 1.45 40 up > 1 hdd 7.39870 1.00000 7.4 TiB 703 GiB 578 GiB 6.0 MiB 2.9 > GiB 6.7 TiB 9.27 1.14 37 up > 2 hdd 7.39870 1.00000 7.4 TiB 700 GiB 576 GiB 3.1 MiB 3.1 > GiB 6.7 TiB 9.24 1.13 39 up On Mon, 8 Apr 2024 at 08:42, Daniel Parkes <dparkes@xxxxxxxxxx> wrote: > Hi Lukasz, > > RGW uses Omap objects for the index pool; Omaps are stored in Rocksdb > database of each osd, not on the actual index pool, so by putting DB/WALL > on an NVMe as you mentioned, you are already configuring the index pool on > a non-rotational drive, you don't need to do anything else. > > You just need to size your DB/WALL partition accordingly. For RGW/object > storage, a good starting point for the DB/Wall sizing is 4%. > > Example of Omap entries in the index pool using 0 bytes, as they are > stored in Rocksdb: > > # rados -p default.rgw.buckets.index listomapkeys .dir.7fb0a3df-9553-4a76-938d-d23711e67677.34162.1.2 > file1 > file2 > file4 > file10 > > rados df -p default.rgw.buckets.index > POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR USED COMPR UNDER COMPR > default.rgw.buckets.index 0 B 11 0 33 0 0 0 208 207 KiB 41 20 KiB 0 B 0 B > > # rados -p default.rgw.buckets.index stat .dir.7fb0a3df-9553-4a76-938d-d23711e67677.34162.1.2 > default.rgw.buckets.index/.dir.7fb0a3df-9553-4a76-938d-d23711e67677.34162.1.2 mtime 2022-12-20T07:32:11.000000-0500, size 0 > > > On Sun, Apr 7, 2024 at 10:06 PM Lukasz Borek <lukasz@xxxxxxxxxxxx> wrote: > >> Hi! >> >> I'm working on a POC cluster setup dedicated to backup app writing objects >> via s3 (large objects, up to 1TB transferred via multipart upload >> process). >> >> Initial setup is 18 storage nodes (12HDDs + 1 NVME card for DB/WALL) + EC >> pool. Plan is to use cephadm. >> >> I'd like to follow good practice and put the RGW index pool on a >> no-rotation drive. Question is how to do it? >> >> - replace a few HDDs (1 per node) with a SSD (how many? 4-6-8?) >> - reserve space on NVME drive on each node, create lv based OSD and let >> rgb index use the same NVME drive as DB/WALL >> >> Thoughts? >> >> -- >> Lukasz >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx >> >> -- Łukasz Borek lukasz@xxxxxxxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx