Re: How to use ceph-volume to create multiple OSDs per NVMe disk, and with fixed WAL/DB partition on another device?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I have a similar setup and have been running some large concurrent
benchmarks and I am seeing that running multiple OSDs per NVME doesn't
really make a lot of difference. In fact, it actually increases the write
amplification if you have write-heavy workloads, so performance degrades
over time.

Also, if you have powerful (and enough) CPU and memory on the system,
ingest throttling and replication becomes the bottleneck and not the NVME
writes.

Although, it might make a difference if you have read heavy workloads. I
haven't tested that enough.

Regards,
Shridhar


On Wed, 11 Nov 2020 at 00:27, Jan Fajerski <jfajerski@xxxxxxxx> wrote:

> On Fri, Nov 06, 2020 at 10:15:52AM -0000, victorhooi@xxxxxxxxx wrote:
> >I'm building a new 4-node Proxmox/Ceph cluster, to hold disk images for
> our VMs. (Ceph version is 15.2.5).
> >
> >Each node has 6 x NVMe SSDs (4TB), and 1 x Optane drive (960GB).
> >
> >CPU is AMD Rome 7442, so there should be plenty of CPU capacity to spare.
> >
> >My aim is to create 4 x OSDs per NVMe SSD (to make more effective use of
> the NVMe performance) and use the Optane drive to store the WAL/DB
> partition for each OSD. (I.e. total of 24 x 35GB WAL/DB partitions).
> >
> >However, I am struggling to get the right ceph-volume command to achieve
> this.
> >
> >Thanks to a very kind Redditor, I was able to get close:
> >
> >/dev/nvme0n1 is an Optane device (900GB).
> >
> >/dev/nvme2n1 is an Intel NVMe SSD (4TB).
> >
> >```
> ># ceph-volume lvm batch --osds-per-device 4 /dev/nvme2n1 --db-devices
> /dev/nvme0n1
> >
> >Total OSDs: 4
> >
> >Solid State VG:
> >  Targets:   block.db                  Total size: 893.00 GB
> >  Total LVs: 16                        Size per LV: 223.25 GB
> >  Devices:   /dev/nvme0n1
> >
> >  Type            Path
> LV Size         % of device
>
> >----------------------------------------------------------------------------------------------------
> >  [data]          /dev/nvme2n1
> 931.25 GB       25.0%
> >  [block.db]      vg: vg/lv
>  223.25 GB       25%
>
> >----------------------------------------------------------------------------------------------------
> >  [data]          /dev/nvme2n1
> 931.25 GB       25.0%
> >  [block.db]      vg: vg/lv
>  223.25 GB       25%
>
> >----------------------------------------------------------------------------------------------------
> >  [data]          /dev/nvme2n1
> 931.25 GB       25.0%
> >  [block.db]      vg: vg/lv
>  223.25 GB       25%
>
> >----------------------------------------------------------------------------------------------------
> >  [data]          /dev/nvme2n1
> 931.25 GB       25.0%
> >  [block.db]      vg: vg/lv
>  223.25 GB       25%
> >--> The above OSDs would be created if the operation continues
> >--> do you want to proceed? (yes/no)
> >```
> >
> >This does split up the NVMe disk into 4 OSDs, and creates WAL/DB
> partition on the Optane drive - however, it creates 4 x 223 GB partitions
> on the Optane (whereas I want 35GB partitions).
> >
> >Is there any way to specify the WAL/DB partition size in the above?
> >
> >And can it be done, such that you can run successive ceph-volume
> commands, to add the OSDs and WAL/DB partitions for each NVMe disk?
> Is there is particular reason you want to run ceph-volume multiple times?
> The
> batch subcommand can handle that in one go without the need to explicitly
> specify any sizes as another reply proposed (though that will work nicely).
>
> Something like this should get you there:
> ceph-volume lvm batch --osds-per-device 4 /dev/nvme1n1 /dev/nvme2n1
> /dev/nvme3n1 /dev/nvme4n1 --db-devices /dev/nvme0n1
>
> This of course makes assumption regarding device names, adjust accordingly.
>
> Another option to size the volumes on the Optane drive would be to rely on
> the
> *slots arguments of the batch subcommand. See either ceph-volume lvm batch
> --help
> or https://docs.ceph.com/en/latest/ceph-volume/lvm/batch/#implicit-sizing
>
> >
> >(Or if there's an easier way to achieve the above layout, please let me
> know).
> >
> >That being said - I also just saw this ceph-users thread:
> >
> >
> https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/3Y6DEJCF7ZMXJL2NRLXUUEX76W7PPYXK/
> >
> >It talks there about "osd op num shards" and "osd op num threads per
> shard" - is there some way to set those, to achieve similar performance to
> say, 4 x OSDs per NVMe drive, but with only 1 x NVMe? Has anybody done any
> testing/benchmarking on this they can share?
> >_______________________________________________
> >ceph-users mailing list -- ceph-users@xxxxxxx
> >To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux