Re: OSD service specs in mixed environment

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Torkil,

I didn't want to suggest using multiple OSD services from the start as you were trying to avoid adding more.

Here, we've been using per hosts (listing hosts and not using wildcard pattern) OSDs specs, as buying new hardware over time, our cluster became more heterogeneous than before in terms of hardware.
We chose to do this on per host basis to be more deterministic and less prone to unexpected orchestrator behaviors over time. That's what counts the most in the end I think.

By the way, we've reached RHCS support a year ago to ask for a regex host pattern, which led to this [1] and that [2], thanks to Adam. It's been merged to Reef but not Quincy yet.
This will help reduce the number of OSDs services as regular expressions will allow catching more hosts than Python's fnmatch did.

Bests,
Frédéric.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=2219373
[2] https://github.com/ceph/ceph/pull/53803

----- Le 28 Juin 24, à 10:34, Torkil Svensgaard torkil@xxxxxxxx a écrit :

> On 27-06-2024 10:56, Frédéric Nass wrote:
>> Hi Torkil, Ruben,
> 
> Hi Frédéric
> 
>> I see two theoretical ways to do this without additional OSD service. One that
>> probably doesn't work :-) and another one that could work depending on how the
>> orchestrator prioritize its actions based on services criteria.
>> 
>> The one that probably doesn't work is by specifying multiple exact size with an
>> OR filter in osd.fast service (with no modifications to the osd.slow service):
>> 
>> ---
>> service_type: osd
>> service_id: fast
>> service_name: osd.fast
>> placement:
>>    host_pattern: '*'
>> spec:
>>    data_devices:
>>      size: 400G
>>      size: 7680G
>>    filter_logic: OR
>>    objectstore: bluestore
>> 
>> I doubt the orchestrator can handle multiple exact size criteria.
> 
> I am pretty sure this wouldn't work because it would also match HDDs.
> 
>> > The one that may work is:
>> 
>> ---
>> service_type: osd
>> service_id: fast
>> service_name: osd.fast
>> placement:
>>     host_pattern: '*'
>> spec:
>>     data_devices:
>>       rotational: 0              <-- remove size criteria
>>     filter_logic: AND
>>     objectstore: bluestore
>> ---
>> service_type: osd
>> service_id: slow
>> service_name: osd.slow
>> placement:
>>     host_pattern: '*'
>> spec:
>>     block_db_size: 290966113186
>>     data_devices:
>>       rotational: 1
>>     db_devices:
>>       rotational: 0
>>       size: '1000G:6900G'
>>       model: NVME-QQQQ-987       <-- specify NVMe's model
>>     filter_logic: AND
>>     objectstore: bluestore
>> ---
>> 
>> This may work if the orchestrator gives priority to services with more specific
>> criteria. If not then you may want to add SSDs vendor's criteria (if 3.2TB and
>> 6.4TB SSDs drives are from the same vendor AND NVMEs are from another) to the
>> osd.fast service.
> 
> We considered trying to test if some ordering or priority could sort
> this but we resigned to just do multiple specs which wouldn't depend on
> undocumented behavior that might change.
> 
> service_type: osd
> service_id: fast
> service_name: osd.fast
> placement:
>   host_pattern: '*'
> spec:
>   data_devices:
>     rotational: 0
>     size: :1000G
>   filter_logic: AND
>   objectstore: bluestore
> ---
> service_type: osd
> service_id: fast2
> service_name: osd.fast2
> placement:
>   host_pattern: '*'
> spec:
>   data_devices:
>     rotational: 0
>     size: '6990G:'
>   filter_logic: AND
>   objectstore: bluestore
> 
> It might be better to go with even more specs and use models instead of
> sizes for everything not HDD but we have a lot of different models so as
> long as it's not broken this will do.
> 
> Thanks for the suggestions!
> 
> Mvh.
> 
> Torkil
> 
>> Regards,
>> Frédéric.
>> 
>> ----- Le 26 Juin 24, à 8:48, Torkil Svensgaard torkil@xxxxxxxx a écrit :
>> 
>>> Hi
>>>
>>> We have a bunch of HDD OSD hosts with DB/WAL on PCI NVMe, either 2 x
>>> 3.2TB or 1 x 6.4TB. We used to have 4 SSDs pr node for journals before
>>> bluestore and those have been repurposed for an SSD pool (wear level is
>>> fine).
>>>
>>> We've been using the following service specs to avoid the PCI NVMe
>>> devices for bluestore being provisioned as OSDs:
>>>
>>> ---
>>> service_type: osd
>>> service_id: fast
>>> service_name: osd.fast
>>> placement:
>>>    host_pattern: '*'
>>> spec:
>>>    data_devices:
>>>      rotational: 0
>>>      size: :1000G  <-- only use devices smaller than 1TB = not PCI NVMe
>>>    filter_logic: AND
>>>    objectstore: bluestore
>>> ---
>>> service_type: osd
>>> service_id: slow
>>> service_name: osd.slow
>>> placement:
>>>    host_pattern: '*'
>>> spec:
>>>    block_db_size: 290966113186
>>>    data_devices:
>>>      rotational: 1
>>>    db_devices:
>>>      rotational: 0
>>>      size: '1000G:' <-- only use devices larger than 1TB for DB/WAL
>>>    filter_logic: AND
>>>    objectstore: bluestore
>>> ---
>>>
>>> We just bought a few 7.68 TB SATA SSDs to add to the SSD pool which
>>> aren't being picked up by the osd.fast spec because they are too large
>>> and they could also be picked up as DB/WAL with the current specs.
>>>
>>> As far as I can determine there is no way to achieve what I want with
>>> the existing specs, as I can't filter on PCI vs SATA, only rotational or
>>> not, I can't use size, as it only can define an in between range, not an
>>> outside range, and I can't use filter_logic OR for the sizes because I
>>> need the rotational qualifier to be AND.
>>>
>>> I can do a osd.fast2 spec with size: 7000G: and change the db_devices
>>> size for osd.slow to something like 1000G:7000G but curious to see if
>>> anyone would have a different suggestion?
>>>
>>> Mvh.
>>>
>>> Torkil
>>>
>>> --
>>> Torkil Svensgaard
>>> Sysadmin
>>> MR-Forskningssektionen, afs. 714
>>> DRCMR, Danish Research Centre for Magnetic Resonance
>>> Hvidovre Hospital
>>> Kettegård Allé 30
>>> DK-2650 Hvidovre
>>> Denmark
>>> Tel: +45 386 22828
>>> E-mail: torkil@xxxxxxxx
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@xxxxxxx
> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux