We've had a specific set of drives that we've had to enable
bdev_enable_discard and bdev_async_discard for in order to maintain
acceptable performance on block clusters. I wrote the patch that Igor
mentioned in order to try and send more parallel discards to the
devices, but these ones in particular seem to process them in serial
(based on observed discard counts and latency going to the device),
which is unfortunate. We're also testing new firmware that suggests it
should help alleviate some of the initial concerns we had about discards
not keeping up which prompted the patch in the first place.
Most of our drives do not need discards enabled (and definitely not
without async) in order to maintain performance unless we're doing a
full disk fio test or something like that where we're trying to find its
cliff profile. We've used OSD classes to help target the options being
applied to specific OSDs via centralized conf which helps when we would
add new hosts that may have different drives so that the options weren't
applied globally.
Based on our experience, I wouldn't enable it unless you're seeing some
sort of cliff-like behaviour as your OSDs run low on free space, or are
heavily fragmented. I would also deem bdev_async_enabled = 1 to be a
requirement so that it doesn't block user IO. Keep an eye on your
discards being sent to devices and the discard latency, as well (via
node_exporter, for example).
Matt
On 2024-03-02 06:18, David C. wrote:
I came across an enterprise NVMe used for BlueFS DB whose performance
dropped sharply after a few months of delivery (I won't mention the
brand
here but it was not among these 3: Intel, Samsung, Micron).
It is clear that enabling bdev_enable_discard impacted performance, but
this option also saved the platform after a few days of discard.
IMHO the most important thing is to validate the behavior when there
has
been a write to the entire flash media.
But this option has the merit of existing.
it seems to me that the ideal would be not to have several options on
bdev_*discard, and that this task should be asynchronous and with the
(D)iscard instructions during a calmer period of activity (I do not see
any
impact if the instructions are lost during an OSD reboot)
Le ven. 1 mars 2024 à 19:17, Igor Fedotov <igor.fedotov@xxxxxxxx> a
écrit :
I played with this feature a while ago and recall it had visible
negative impact on user operations due to the need to submit tons of
discard operations - effectively each data overwrite operation
triggers
one or more discard operation submission to disk.
And I doubt this has been widely used if any.
Nevertheless recently we've got a PR to rework some aspects of thread
management for this stuff, see https://github.com/ceph/ceph/pull/55469
The author claimed they needed this feature for their cluster so you
might want to ask him about their user experience.
W.r.t documentation - actually there are just two options
- bdev_enable_discard - enables issuing discard to disk
- bdev_async_discard - instructs whether discard requests are issued
synchronously (along with disk extents release) or asynchronously
(using
a background thread).
Thanks,
Igor
On 01/03/2024 13:06, jsterr@xxxxxxxxxxxx wrote:
> Is there any update on this? Did someone test the option and has
> performance values before and after?
> Is there any good documentation regarding this option?
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx