On 1/30/19 5:11 PM, Michal Soltys wrote:
On 19/01/28 19:44, Michal Soltys wrote:
On 1/28/19 5:57 PM, Song Liu wrote:
<cut>
I looked a bit deeper at raid10 and raid5 (4x32g) logs, and the behavior
is just really weird:
1) r10, blkdiscard
blkdiscard itself submits device-long discard via ioctl, which then is
split into 8388607 sectors long parts. Further down these are split into:
- 8191 x 1024s, 1023s
- 8192 x 1s every 1024s, then going backwards from 8g to 4g: 1022s, 8191
x 1023s
- 8192 x 2s, then backwards: 1021s, 8191 x 1022s
....
- remaining of the device: 8065 x 15s, then backwards: 8064 x 1009s
Anything but first 4g is completely unmergable. Afterwards, why is it
sending single sector values (then 2, then 3) every 1024s, then fill up
the rest of the those 1024s but going backwards ?
For the record, if I force blkdiscard to use power-of-2 aligned step, it
works w/o the weird small/backwards approach.
2) r10, fstrim
While it's working notably faster on empty fs (still very long - nearly
1 minute), the splits are really weird sized: 648s + 376s, 952s + 72s
(smaller ones going backwards as well), so those are not mergable
either. Lots of full 1024s ones though.
In comparison, fstrim on single partition of the same size takes ~ 1.6s
with large discards going 1:1 pretty much.
3) r5, blkdiscard
Now the case of raid5 - while the behavior seems cleaner there (no
unusual splits), the unusually precise 10ms delays between each discard
completion are the main culprit as far as I can see. While the 4k splits
(which then get merged back to chunk-sized pieces) take their toll, it's
a small footprint in comparison.
Anyway,
Song, do you have some suggestions or comments about those results (or
need more specific tests to do, while I can still do them) ?