On 4/30/18 3:31 PM, Dave Chinner wrote: > On Mon, Apr 30, 2018 at 09:32:52AM -0600, Jens Axboe wrote: >> XFS recently added support for async discards. While this can be >> a win for some workloads and devices, there are also cases where >> async bursty discard will severly harm the latencies of reads >> and writes. > > FWIW, convention is it document the performance regression in the > commit message, not leave the reader to guess at what it was.... Yeah I'll give you that, I can improve the commit message for sure. > Did anyone analyse the pattern of discards being issued to work out > what pattern was worse for async vs sync discard? is it lots of > little discards, large extents being discarded, perhaps a problem > with the request request queue starving other IOs because we queue > so many async discards in such a short time (which is the difference > in behaviour vs the old code), or something else? What was observed was a big discard which would previously have gone down as smaller discards now going down as either one or many discards. Looking at the blktrace data, it's the difference between discard 1 queue discard 1 complete discatd 2 queue discard 2 complete [...] discard n queue discard n complete which is now discard 1 queue discard 2 queue [...] discard n queue [...] discard 1 complete discard 2 complete [...] discard n complete Note that we set a max discard size of 64MB for most devices, since it's been shown to have less impact on latencies for the IO that jobs actually care about. >> Add a 'discard_sync' mount flag to revert to using sync discard, >> issuing them one at the time and waiting for each one. This fixes >> a big performance regression we had moving to kernels that include >> the XFS async discard support. > > I'm not a fan of adding a mount option to work around bad, > unpredictable performance due to a mount option we recommend you > don't use because it results in bad, unpredictable performance. Oh I hear you, as I wrote in other replies, I don't generally recommend discard except for cases where it's been proven to be useful in terms of write amplification improvements. If we can avoid using it, we do. It's a tradeoff, and for some situations, the right decision is to use discards. > Without any details of the discard pattern that results in problems > I don't think we should be changing anything - adding an opaque, > user-unfriendly mount option does nothing to address the underlying > problem - it's just a hack to work around the symptoms being seen... > > More details of the regression and the root cause analysis is > needed, please. It brings back the same behavior as we had before, which performs better for us. It's preventing users of XFS+discard from upgrading, which is sad. -- Jens Axboe