Re: Question if WAL/block.db partition will benefit us

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,
we use enterprise SSDs like SAMSUNG MZ7KM1T9.
The work very well for our block storage. Some NVMe would be a lot nicer
but we have some good experience with them.

One SSD fail takes down 10 OSDs might sound hard, but this would be an
okayish risk. Most of the tunables are defaul in our setup and this looks
like PGs have a failure domain of a host. I restart the systems on a
regular basis for kernel updates.
Also checking disk io with dstat seems to be rather low on the SSDs (below
1k IOPs)
root@s3db18:~# dstat --disk --io  -T  -D sdd
--dsk/sdd-- ---io/sdd-- --epoch---
 read  writ| read  writ|  epoch
 214k 1656k|7.21   126 |1636536603
 144k 1176k|2.00   200 |1636536604
 128k 1400k|2.00   230 |1636536605

Normaly I would now try this configuration:
1 SSD / 10 OSDs - having 150GB of block.db and block.wal, both on the same
partition as someone stated before, and 200GB extra to move all pools
except the .data pool to SSDs.

But thinking about 10 downed OSDs if one SSD fails let's me wonder how to
recover from that.
IIRC the configuration per OSDs is in the LVM tags:
root@s3db18:~# lvs -o lv_tags
  LV Tags

ceph.block_device=...,ceph.db_device=/dev/sdd8,ceph.db_uuid=011275a3-4201-8840-a678-c2e23d38bfd6,...

When the SSD fails, can I just remove the tags and restart the OSD
with ceph-volume
lvm activate --all? And after replacing the failed SSD readd the tags with
the correct IDs? Do I need to do anything else to prepare a block.db
partition?

Cheers
 Boris


Am Di., 9. Nov. 2021 um 22:15 Uhr schrieb prosergey07 <prosergey07@xxxxxxxxx
>:

> Not sure how much it would help the performance with osd's backed with ssd
> db and wal devices. Even if you go this route with one ssd per 10 hdd, you
> might want to set the failure domain per host in crush rules in case ssd is
> out of service.
>
>  But from the practice ssd will not help too much to boost the performance
> especially for sharing it between 10 hdds.
>
>  We use nvme db+wal per osd and separate nvme specifically for metadata
> pools. There will be a lot of I/O on bucket.index pool and rgw pool which
> stores user, bucket metadata. So you might want to put them into separate
> fast storage.
>
>  Also if there will not be too much objects, like huge objects but not
> tens-hundreds million of them then bucket index will have less presure and
> ssd might be okay for metadata pools in that case.
>
>
>
> Надіслано з пристрою Galaxy
>
>
> -------- Оригінальне повідомлення --------
> Від: Boris Behrens <bb@xxxxxxxxx>
> Дата: 08.11.21 13:08 (GMT+02:00)
> Кому: ceph-users@xxxxxxx
> Тема:  Question if WAL/block.db partition will benefit us
>
> Hi,
> we run a larger octopus s3 cluster with only rotating disks.
> 1.3 PiB with 177 OSDs, some with a SSD block.db and some without.
>
> We have a ton of spare 2TB disks and we just wondered if we can bring the
> to good use.
> For every 10 spinning disks we could add one 2TB SSD and we would create
> two partitions per OSD (130GB for block.db and 20GB for block.wal). This
> would leave some empty space on the SSD for waer leveling.
>
> The question now is: would we benefit from this? Most of the data that is
> written to the cluster is very large (50GB and above). This would take a
> lot of work into restructuring the cluster and also two other clusters.
>
> And does it make a different to have only a block.db partition or a
> block.db and a block.wal partition?
>
> Cheers
> Boris
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux