Could we not consider setting up a “bluefstrim” which could be orchestrated ? This would avoid having a continuous stream of (D)iscard instructions on the disks during activity. A weekly (probably monthly) bluefstrim could probably be enough for platforms that really need it. Le sam. 2 mars 2024 à 12:58, Matt Vandermeulen <storage@xxxxxxxxxxxx> a écrit : > We've had a specific set of drives that we've had to enable > bdev_enable_discard and bdev_async_discard for in order to maintain > acceptable performance on block clusters. I wrote the patch that Igor > mentioned in order to try and send more parallel discards to the > devices, but these ones in particular seem to process them in serial > (based on observed discard counts and latency going to the device), > which is unfortunate. We're also testing new firmware that suggests it > should help alleviate some of the initial concerns we had about discards > not keeping up which prompted the patch in the first place. > > Most of our drives do not need discards enabled (and definitely not > without async) in order to maintain performance unless we're doing a > full disk fio test or something like that where we're trying to find its > cliff profile. We've used OSD classes to help target the options being > applied to specific OSDs via centralized conf which helps when we would > add new hosts that may have different drives so that the options weren't > applied globally. > > Based on our experience, I wouldn't enable it unless you're seeing some > sort of cliff-like behaviour as your OSDs run low on free space, or are > heavily fragmented. I would also deem bdev_async_enabled = 1 to be a > requirement so that it doesn't block user IO. Keep an eye on your > discards being sent to devices and the discard latency, as well (via > node_exporter, for example). > > Matt > > > On 2024-03-02 06:18, David C. wrote: > > I came across an enterprise NVMe used for BlueFS DB whose performance > > dropped sharply after a few months of delivery (I won't mention the > > brand > > here but it was not among these 3: Intel, Samsung, Micron). > > It is clear that enabling bdev_enable_discard impacted performance, but > > this option also saved the platform after a few days of discard. > > > > IMHO the most important thing is to validate the behavior when there > > has > > been a write to the entire flash media. > > But this option has the merit of existing. > > > > it seems to me that the ideal would be not to have several options on > > bdev_*discard, and that this task should be asynchronous and with the > > (D)iscard instructions during a calmer period of activity (I do not see > > any > > impact if the instructions are lost during an OSD reboot) > > > > > > Le ven. 1 mars 2024 à 19:17, Igor Fedotov <igor.fedotov@xxxxxxxx> a > > écrit : > > > >> I played with this feature a while ago and recall it had visible > >> negative impact on user operations due to the need to submit tons of > >> discard operations - effectively each data overwrite operation > >> triggers > >> one or more discard operation submission to disk. > >> > >> And I doubt this has been widely used if any. > >> > >> Nevertheless recently we've got a PR to rework some aspects of thread > >> management for this stuff, see https://github.com/ceph/ceph/pull/55469 > >> > >> The author claimed they needed this feature for their cluster so you > >> might want to ask him about their user experience. > >> > >> > >> W.r.t documentation - actually there are just two options > >> > >> - bdev_enable_discard - enables issuing discard to disk > >> > >> - bdev_async_discard - instructs whether discard requests are issued > >> synchronously (along with disk extents release) or asynchronously > >> (using > >> a background thread). > >> > >> Thanks, > >> > >> Igor > >> > >> On 01/03/2024 13:06, jsterr@xxxxxxxxxxxx wrote: > >> > Is there any update on this? Did someone test the option and has > >> > performance values before and after? > >> > Is there any good documentation regarding this option? > >> > _______________________________________________ > >> > ceph-users mailing list -- ceph-users@xxxxxxx > >> > To unsubscribe send an email to ceph-users-leave@xxxxxxx > >> _______________________________________________ > >> ceph-users mailing list -- ceph-users@xxxxxxx > >> To unsubscribe send an email to ceph-users-leave@xxxxxxx > >> > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx