Re: ceph-volume and automatic OSD provisioning

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jun 21, 2018 at 11:11 AM, John Spray <jspray@xxxxxxxxxx> wrote:
> On Thu, Jun 21, 2018 at 3:42 PM Anthony D'Atri <aad@xxxxxxxxxxxxxx> wrote:
>>
>> A couple thoughts wrt what I've seen so far:
>>
>> o Would this require that the metadata devices be empty?
>>
>> o If an OSD drive bites the dust, how does one identify the metadata device/partition it was using so that it can be wiped, re-used, etc?
>>
>> o How does this fit into the ongoing OSD lifecycle?  Ie., when an OSD dies, is removed completely, and is redeployed, does the code reuse the same metadata partition(s), or does it attempt to create new on an available device?  If the latter, it's going to run out sooner or later.
>>
>> o The above, but with a *destroyed* OSD? Or if an OSD is repaved for whatever reason -- differing parameters, Filestore <--> Bluestore <--> whatever?  What happens if one changes the size of metadata partition required after initial deployment?
>
> My thought on the OSD lifecycle stuff is that it belongs at higher
> levels (ceph-volume is not the whole story).  At some higher level we
> would have a persistent record of which devices had previously been
> used as OSDs, in order to recreate them on failure.  The orchestrator
> (rook, ceph-ansible, deepsea) would contain an opinionated policy
> about how to treat the configuration through replacements: whether
> selecting a device means literally just that device, or that device
> and subsequently any empty replacement that shows up in the same slot.
>
> I think we need to have a tight scope around what ceph-volume's device
> selection does: it's there to pick a default (something reasonable but
> not necessarily optimal), to work okay on most systems (but not
> necessarily all), and to make a device selection for installation (not
> to handle the OSD lifecycle overall).

This, exactly. We are trying to ease the initial provisioning of OSDs
while at the same time, opening up an API for higher level tooling to
fine tune what they need

>
> John
>
>> o I've been curious how people so far have managed the OSD:journal/metadata partition mapping.  In the past we had a wrapper around ceph-deploy with a rigid mapping of OSD drive slot to partition number.  It required the single NVMe device to be pre-partitioned and was kind of ugly and error-prone.  The drive slot was used instead of sdX name given the Linux kernel's fondness for the mapping to change as a result of various drive failure / replacement scenarios.
>>
>> o Some sites with multiple HBAs, NICs, metadata devices etc. go to great lengths to pin resources on common CPU cores, PCIe slots, and especially NUMA nodes; chances are good that such a deployment couldn't use this.
>>
>> I totally understand and support the idea of auto-selecting a metadata device/partition, managing them can be a bear, but I humbly submit that attention needs to be paid to the needs OSD lifecycle events and the various dynamics that can happen to a production cluster over the years.
>>
>> Notably it would be really really nice to have the ability to configure mapping rules, or even a simple hardcoded EID:SLOT -> device/partition # mapping.
>>
>> Apologies if any of these were already covered or are out of scope.
>>
>> -- Anthony
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux