Re: [PATCH 2/2] xfs: add 'discard_sync' mount flag

Jens Axboe <axboe@xxxxxxxxx> · Mon, 30 Apr 2018 12:07:31 -0600

On 4/30/18 11:19 AM, Brian Foster wrote:
> On Mon, Apr 30, 2018 at 09:32:52AM -0600, Jens Axboe wrote:
>> XFS recently added support for async discards. While this can be
>> a win for some workloads and devices, there are also cases where
>> async bursty discard will severly harm the latencies of reads
>> and writes.
>>
>> Add a 'discard_sync' mount flag to revert to using sync discard,
>> issuing them one at the time and waiting for each one. This fixes
>> a big performance regression we had moving to kernels that include
>> the XFS async discard support.
>>
>> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
>> ---
> 
> Hm, I figured the async discard stuff would have been a pretty clear win
> all around, but then again I'm not terribly familiar with what happens
> with discards beneath the fs. I do know that the previous behavior would
> cause fs level latencies due to holding up log I/O completion while
> discards completed one at a time. My understanding is that this lead to
> online discard being pretty much universally "not recommended" in favor
> of fstrim.

It's not a secret that most devices suck at discard. While the async
discard is nifty and I bet works well for some cases, it can also cause
a flood of discards on the device side which does not work well for
other cases.

> Do you have any more data around the workload where the old sync discard
> behavior actually provides an overall win over the newer behavior? Is it
> purely a matter of workload, or is it a workload+device thing with how
> discards may be handled by certain devices?

The worse read latencies were observed on more than one device type,
making it sync again was universally a win. We've had many issues
with discard, one trick that is often used is to chop up file deletion
into smaller chunks. Say you want to kill 10GB of data, you do it
incrementally, since 10G of discard usually doesn't behave very nicely.
If you make that async, then you're back to square one.

> I'm ultimately not against doing this if it's useful for somebody and is
> otherwise buried under a mount option, but it would be nice to see if
> there's opportunity to improve the async mechanism before resorting to
> that. Is the issue here too large bio chains, too many chains at once,
> or just simply too many discards (regardless of chaining) at the same
> time?

Well, ultimately you'd need better scheduling of the discards, but for
most devices what really works the best is a simple "just do one at
the time". The size constraint was added to further limit the impact.

Honestly, I think the only real improvement would be on the device
side. Most folks want discard as an advisory hint, and it should not
impact current workloads at all. In reality, many implementations
are much more strict and even include substantial flash writes. For
the cases where we can't just turn it off (I'd love to), we at least
need to make it less intrusive.

> I'm wondering if xfs could separate discard submission from log I/O
> completion entirely and then perhaps limit/throttle submission somehow
> or another (Christoph, thoughts?) via a background task. Perhaps doing
> something like that may even eliminate the need for some discards on
> busy filesystems with a lot of block free -> reallocation activity, but
> I'm just handwaving atm.

Just having the sync vs async option is the best split imho. The async
could potentially be scheduled. I don't think more involved logic
belongs in the fs.

-- 
Jens Axboe