Hi, Yes, that documentation you are linking is from Ceph 3.x with Filestore, With Bluestore this is no longer the case, the link to the latest Red Hat doc version is here: https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/7/html-single/object_gateway_guide/index#index-pool_rgw I see they have this block of text there: "For Red Hat Ceph Storage running Bluestore, Red Hat recommends deploying an NVMe drive as a block.db device, rather than as a separate pool. Ceph Object Gateway index data is written only into an object map (OMAP). OMAP data for BlueStore resides on the block.db device on an OSD. When an NVMe drive functions as a block.db device for an HDD OSD and when the index pool is backed by HDD OSDs, the index data will ONLY be written to the block.db device. As long as the block.db partition/lvm is sized properly at 4% of block, this configuration is all that is needed for BlueStore." On Mon, Apr 8, 2024 at 12:02 PM Lukasz Borek <lukasz@xxxxxxxxxxxx> wrote: > Thanks for clarifying. > > So redhat doc > <https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/2/html-single/ceph_object_gateway_for_production/index#adv-rgw-hardware-bucket-index> > is outdated? > > 3.6. Selecting SSDs for Bucket Indexes > > > When selecting OSD hardware for use with a Ceph Object >> Gateway—irrespective of the use case—Red Hat recommends considering an OSD >> node that has at least one SSD drive used exclusively for the bucket index >> pool. This is particularly important when buckets will contain a large >> number of objects. > > > A bucket index entry is approximately 200 bytes of data, stored as an >> object map (omap) in leveldb. While this is a trivial amount of data, some >> uses of Ceph Object Gateway can result in tens or hundreds of millions of >> objects in a single bucket. By mapping the bucket index pool to a CRUSH >> hierarchy of SSD nodes, the reduced latency provides a dramatic performance >> improvement when buckets contain very large numbers of objects. > > >> Important >> In a production cluster, a typical OSD node will have at least one SSD >> for the bucket index, AND at least on SSD for the journal. > > > Current utilisation is what osd df command shows in OMAP field?: > > root@cephbackup:/# ceph osd df >> ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META >> AVAIL %USE VAR PGS STATUS >> 0 hdd 7.39870 1.00000 7.4 TiB 894 GiB 769 GiB 1.5 MiB 3.4 >> GiB 6.5 TiB 11.80 1.45 40 up >> 1 hdd 7.39870 1.00000 7.4 TiB 703 GiB 578 GiB 6.0 MiB 2.9 >> GiB 6.7 TiB 9.27 1.14 37 up >> 2 hdd 7.39870 1.00000 7.4 TiB 700 GiB 576 GiB 3.1 MiB 3.1 >> GiB 6.7 TiB 9.24 1.13 39 up > > > > > > On Mon, 8 Apr 2024 at 08:42, Daniel Parkes <dparkes@xxxxxxxxxx> wrote: > >> Hi Lukasz, >> >> RGW uses Omap objects for the index pool; Omaps are stored in Rocksdb >> database of each osd, not on the actual index pool, so by putting DB/WALL >> on an NVMe as you mentioned, you are already configuring the index pool on >> a non-rotational drive, you don't need to do anything else. >> >> You just need to size your DB/WALL partition accordingly. For RGW/object >> storage, a good starting point for the DB/Wall sizing is 4%. >> >> Example of Omap entries in the index pool using 0 bytes, as they are >> stored in Rocksdb: >> >> # rados -p default.rgw.buckets.index listomapkeys .dir.7fb0a3df-9553-4a76-938d-d23711e67677.34162.1.2 >> file1 >> file2 >> file4 >> file10 >> >> rados df -p default.rgw.buckets.index >> POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR USED COMPR UNDER COMPR >> default.rgw.buckets.index 0 B 11 0 33 0 0 0 208 207 KiB 41 20 KiB 0 B 0 B >> >> # rados -p default.rgw.buckets.index stat .dir.7fb0a3df-9553-4a76-938d-d23711e67677.34162.1.2 >> default.rgw.buckets.index/.dir.7fb0a3df-9553-4a76-938d-d23711e67677.34162.1.2 mtime 2022-12-20T07:32:11.000000-0500, size 0 >> >> >> On Sun, Apr 7, 2024 at 10:06 PM Lukasz Borek <lukasz@xxxxxxxxxxxx> wrote: >> >>> Hi! >>> >>> I'm working on a POC cluster setup dedicated to backup app writing >>> objects >>> via s3 (large objects, up to 1TB transferred via multipart upload >>> process). >>> >>> Initial setup is 18 storage nodes (12HDDs + 1 NVME card for DB/WALL) + EC >>> pool. Plan is to use cephadm. >>> >>> I'd like to follow good practice and put the RGW index pool on a >>> no-rotation drive. Question is how to do it? >>> >>> - replace a few HDDs (1 per node) with a SSD (how many? 4-6-8?) >>> - reserve space on NVME drive on each node, create lv based OSD and >>> let >>> rgb index use the same NVME drive as DB/WALL >>> >>> Thoughts? >>> >>> -- >>> Lukasz >>> _______________________________________________ >>> ceph-users mailing list -- ceph-users@xxxxxxx >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >>> >>> > > -- > Łukasz Borek > lukasz@xxxxxxxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx