Re: raid456's devices_handle_discard_safely is unusably slow

Michal Soltys <soltys@xxxxxxxx> · Wed, 6 Feb 2019 18:01:11 +0100

On 1/30/19 5:11 PM, Michal Soltys wrote:
On 19/01/28 19:44, Michal Soltys wrote:
On 1/28/19 5:57 PM, Song Liu wrote:

<cut>

I looked a bit deeper at raid10 and raid5 (4x32g) logs, and the behavior 
is just really weird:

1) r10, blkdiscard

blkdiscard itself submits device-long discard via ioctl, which then is 
split into 8388607 sectors long parts. Further down these are split into:

- 8191 x 1024s, 1023s
- 8192 x 1s every 1024s, then going backwards from 8g to 4g: 1022s, 8191 
x 1023s
- 8192 x 2s, then backwards: 1021s, 8191 x 1022s
....
- remaining of the device: 8065 x 15s, then backwards: 8064 x 1009s

Anything but first 4g is completely unmergable. Afterwards, why is it 
sending single sector values (then 2, then 3) every 1024s, then fill up 
the rest of the those 1024s but going backwards ?

For the record, if I force blkdiscard to use power-of-2 aligned step, it 
works w/o the weird small/backwards approach.

2) r10, fstrim

While it's working notably faster on empty fs (still very long - nearly 
1 minute), the splits are really weird sized: 648s + 376s, 952s + 72s 
(smaller ones going backwards as well), so those are not mergable 
either. Lots of full 1024s ones though.

In comparison, fstrim on single partition of the same size takes ~ 1.6s 
with large discards going 1:1 pretty much.

3) r5, blkdiscard

Now the case of raid5 - while the behavior seems cleaner there (no 
unusual splits), the unusually precise 10ms delays between each discard 
completion are the main culprit as far as I can see. While the 4k splits 
(which then get merged back to chunk-sized pieces) take their toll, it's 
a small footprint in comparison.

Anyway,

Song, do you have some suggestions or comments about those results (or 
need more specific tests to do, while I can still do them) ?