Re: OSD service specs in mixed environment

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




----- Le 26 Juin 24, à 10:50, Torkil Svensgaard torkil@xxxxxxxx a écrit :

> On 26/06/2024 08:48, Torkil Svensgaard wrote:
>> Hi
>> 
>> We have a bunch of HDD OSD hosts with DB/WAL on PCI NVMe, either 2 x
>> 3.2TB or 1 x 6.4TB. We used to have 4 SSDs pr node for journals before
>> bluestore and those have been repurposed for an SSD pool (wear level is
>> fine).
>> 
>> We've been using the following service specs to avoid the PCI NVMe
>> devices for bluestore being provisioned as OSDs:
>> 
>> ---
>> service_type: osd
>> service_id: fast
>> service_name: osd.fast
>> placement:
>>    host_pattern: '*'
>> spec:
>>    data_devices:
>>      rotational: 0
>>      size: :1000G  <-- only use devices smaller than 1TB = not PCI NVMe
>>    filter_logic: AND
>>    objectstore: bluestore
>> ---
>> service_type: osd
>> service_id: slow
>> service_name: osd.slow
>> placement:
>>    host_pattern: '*'
>> spec:
>>    block_db_size: 290966113186
>>    data_devices:
>>      rotational: 1
>>    db_devices:
>>      rotational: 0
>>      size: '1000G:' <-- only use devices larger than 1TB for DB/WAL
>>    filter_logic: AND
>>    objectstore: bluestore
>> ---
>> 
>> We just bought a few 7.68 TB SATA SSDs to add to the SSD pool which
>> aren't being picked up by the osd.fast spec because they are too large
>> and they could also be picked up as DB/WAL with the current specs.
>> 
>> As far as I can determine there is no way to achieve what I want with
>> the existing specs, as I can't filter on PCI vs SATA, only rotational or
>> not, I can't use size, as it only can define an in between range, not an
>> outside range, and I can't use filter_logic OR for the sizes because I
>> need the rotational qualifier to be AND.
>> 
>> I can do a osd.fast2 spec with size: 7000G: and change the db_devices
>> size for osd.slow to something like 1000G:7000G but curious to see if
>> anyone would have a different suggestion?
> 
> Regarding this last part, this is the new SSD as ceph orch device ls
> sees it:
> 
> ssd   ATA_SAMSUNG_MZ7L37T6HBLA-00A07_S6EPNN0X504375  7153G
> 
> But this in a spec doesn't match it:
> 
> size: '7000G:'
> 
> This does:
> 
> size: '6950G:'
> 
> I can't get that to make sense. The value from ceph orch device ls looks
> like GiB. The documentation[1] states that the spec file uses GB and
> 7000GB should be less than 7153GiB (and so should 7000GiB for that
> matter)? Some sort of internal rounding off?
> 
> Mvh.
> 
> Torkil
> 
> [1] https://docs.ceph.com/en/latest/cephadm/services/osd/

I've examined the code, and here's what I found (I'd appreciate if someone could confirm my understanding):

The orchestrator gets the disk size from ceph-volume inventory command (human_readable_size) and compares it to whatever size value(s) you set for the OSD service.

$ cephadm shell ceph-volume inventory /dev/sdc --format json | jq .sys_api.human_readable_size
"3.64 TB"

The 'size:' spec you set is in GB (only GB and MB are supported). However, ceph-volume inventory output can use other units (TB in this example). Therefore, the orchestrator first converts both values to bytes. Since the ceph-volume inventory produces a figure with only 2 decimals and the conversion uses powers of 10 (1e+9 for GB, 1e+12 for TB), the matching size here would be "size: 3640GB". This was confirmed by my testing a few months ago.

If my understanding is correct, it may be worth adding to the doc [2] that the device size is human_readable_size in TB from ceph-volume inventory x 10 GB.

Regards,
Frédéric.

[1] https://github.com/ceph/ceph/blob/main/src/python-common/ceph/deployment/drive_selection/matchers.py
[2] https://docs.ceph.com/en/latest/cephadm/services/osd/

> 
> 
>> Mvh.
>> 
>> Torkil
>> 
> 
> --
> Torkil Svensgaard
> Sysadmin
> MR-Forskningssektionen, afs. 714
> DRCMR, Danish Research Centre for Magnetic Resonance
> Hvidovre Hospital
> Kettegård Allé 30
> DK-2650 Hvidovre
> Denmark
> Tel: +45 386 22828
> E-mail: torkil@xxxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux