Hi, we use enterprise SSDs like SAMSUNG MZ7KM1T9. The work very well for our block storage. Some NVMe would be a lot nicer but we have some good experience with them. One SSD fail takes down 10 OSDs might sound hard, but this would be an okayish risk. Most of the tunables are defaul in our setup and this looks like PGs have a failure domain of a host. I restart the systems on a regular basis for kernel updates. Also checking disk io with dstat seems to be rather low on the SSDs (below 1k IOPs) root@s3db18:~# dstat --disk --io -T -D sdd --dsk/sdd-- ---io/sdd-- --epoch--- read writ| read writ| epoch 214k 1656k|7.21 126 |1636536603 144k 1176k|2.00 200 |1636536604 128k 1400k|2.00 230 |1636536605 Normaly I would now try this configuration: 1 SSD / 10 OSDs - having 150GB of block.db and block.wal, both on the same partition as someone stated before, and 200GB extra to move all pools except the .data pool to SSDs. But thinking about 10 downed OSDs if one SSD fails let's me wonder how to recover from that. IIRC the configuration per OSDs is in the LVM tags: root@s3db18:~# lvs -o lv_tags LV Tags ceph.block_device=...,ceph.db_device=/dev/sdd8,ceph.db_uuid=011275a3-4201-8840-a678-c2e23d38bfd6,... When the SSD fails, can I just remove the tags and restart the OSD with ceph-volume lvm activate --all? And after replacing the failed SSD readd the tags with the correct IDs? Do I need to do anything else to prepare a block.db partition? Cheers Boris Am Di., 9. Nov. 2021 um 22:15 Uhr schrieb prosergey07 <prosergey07@xxxxxxxxx >: > Not sure how much it would help the performance with osd's backed with ssd > db and wal devices. Even if you go this route with one ssd per 10 hdd, you > might want to set the failure domain per host in crush rules in case ssd is > out of service. > > But from the practice ssd will not help too much to boost the performance > especially for sharing it between 10 hdds. > > We use nvme db+wal per osd and separate nvme specifically for metadata > pools. There will be a lot of I/O on bucket.index pool and rgw pool which > stores user, bucket metadata. So you might want to put them into separate > fast storage. > > Also if there will not be too much objects, like huge objects but not > tens-hundreds million of them then bucket index will have less presure and > ssd might be okay for metadata pools in that case. > > > > Надіслано з пристрою Galaxy > > > -------- Оригінальне повідомлення -------- > Від: Boris Behrens <bb@xxxxxxxxx> > Дата: 08.11.21 13:08 (GMT+02:00) > Кому: ceph-users@xxxxxxx > Тема: Question if WAL/block.db partition will benefit us > > Hi, > we run a larger octopus s3 cluster with only rotating disks. > 1.3 PiB with 177 OSDs, some with a SSD block.db and some without. > > We have a ton of spare 2TB disks and we just wondered if we can bring the > to good use. > For every 10 spinning disks we could add one 2TB SSD and we would create > two partitions per OSD (130GB for block.db and 20GB for block.wal). This > would leave some empty space on the SSD for waer leveling. > > The question now is: would we benefit from this? Most of the data that is > written to the cluster is very large (50GB and above). This would take a > lot of work into restructuring the cluster and also two other clusters. > > And does it make a different to have only a block.db partition or a > block.db and a block.wal partition? > > Cheers > Boris > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx