Re: has anyone enabled bdev_enable_discard?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Could we not consider setting up a “bluefstrim” which could be orchestrated
?

This would avoid having a continuous stream of (D)iscard instructions on
the disks during activity.

A weekly (probably monthly) bluefstrim could probably be enough for
platforms that really need it.


Le sam. 2 mars 2024 à 12:58, Matt Vandermeulen <storage@xxxxxxxxxxxx> a
écrit :

> We've had a specific set of drives that we've had to enable
> bdev_enable_discard and bdev_async_discard for in order to maintain
> acceptable performance on block clusters. I wrote the patch that Igor
> mentioned in order to try and send more parallel discards to the
> devices, but these ones in particular seem to process them in serial
> (based on observed discard counts and latency going to the device),
> which is unfortunate. We're also testing new firmware that suggests it
> should help alleviate some of the initial concerns we had about discards
> not keeping up which prompted the patch in the first place.
>
> Most of our drives do not need discards enabled (and definitely not
> without async) in order to maintain performance unless we're doing a
> full disk fio test or something like that where we're trying to find its
> cliff profile. We've used OSD classes to help target the options being
> applied to specific OSDs via centralized conf which helps when we would
> add new hosts that may have different drives so that the options weren't
> applied globally.
>
> Based on our experience, I wouldn't enable it unless you're seeing some
> sort of cliff-like behaviour as your OSDs run low on free space, or are
> heavily fragmented. I would also deem bdev_async_enabled = 1 to be a
> requirement so that it doesn't block user IO. Keep an eye on your
> discards being sent to devices and the discard latency, as well (via
> node_exporter, for example).
>
> Matt
>
>
> On 2024-03-02 06:18, David C. wrote:
> > I came across an enterprise NVMe used for BlueFS DB whose performance
> > dropped sharply after a few months of delivery (I won't mention the
> > brand
> > here but it was not among these 3: Intel, Samsung, Micron).
> > It is clear that enabling bdev_enable_discard impacted performance, but
> > this option also saved the platform after a few days of discard.
> >
> > IMHO the most important thing is to validate the behavior when there
> > has
> > been a write to the entire flash media.
> > But this option has the merit of existing.
> >
> > it seems to me that the ideal would be not to have several options on
> > bdev_*discard, and that this task should be asynchronous and with the
> > (D)iscard instructions during a calmer period of activity (I do not see
> > any
> > impact if the instructions are lost during an OSD reboot)
> >
> >
> > Le ven. 1 mars 2024 à 19:17, Igor Fedotov <igor.fedotov@xxxxxxxx> a
> > écrit :
> >
> >> I played with this feature a while ago and recall it had visible
> >> negative impact on user operations due to the need to submit tons of
> >> discard operations - effectively each data overwrite operation
> >> triggers
> >> one or more discard operation submission to disk.
> >>
> >> And I doubt this has been widely used if any.
> >>
> >> Nevertheless recently we've got a PR to rework some aspects of thread
> >> management for this stuff, see https://github.com/ceph/ceph/pull/55469
> >>
> >> The author claimed they needed this feature for their cluster so you
> >> might want to ask him about their user experience.
> >>
> >>
> >> W.r.t documentation - actually there are just two options
> >>
> >> - bdev_enable_discard - enables issuing discard to disk
> >>
> >> - bdev_async_discard - instructs whether discard requests are issued
> >> synchronously (along with disk extents release) or asynchronously
> >> (using
> >> a background thread).
> >>
> >> Thanks,
> >>
> >> Igor
> >>
> >> On 01/03/2024 13:06, jsterr@xxxxxxxxxxxx wrote:
> >> > Is there any update on this? Did someone test the option and has
> >> > performance values before and after?
> >> > Is there any good documentation regarding this option?
> >> > _______________________________________________
> >> > ceph-users mailing list -- ceph-users@xxxxxxx
> >> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >> _______________________________________________
> >> ceph-users mailing list -- ceph-users@xxxxxxx
> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux