On Wed, Feb 6, 2019 at 9:01 AM Michal Soltys <soltys@xxxxxxxx> wrote: > > On 1/30/19 5:11 PM, Michal Soltys wrote: > > On 19/01/28 19:44, Michal Soltys wrote: > >> On 1/28/19 5:57 PM, Song Liu wrote: > >> > >> <cut> > >> > > > > I looked a bit deeper at raid10 and raid5 (4x32g) logs, and the behavior > > is just really weird: > > > > 1) r10, blkdiscard > > > > blkdiscard itself submits device-long discard via ioctl, which then is > > split into 8388607 sectors long parts. Further down these are split into: > > > > - 8191 x 1024s, 1023s > > - 8192 x 1s every 1024s, then going backwards from 8g to 4g: 1022s, 8191 > > x 1023s > > - 8192 x 2s, then backwards: 1021s, 8191 x 1022s > > .... > > - remaining of the device: 8065 x 15s, then backwards: 8064 x 1009s > > > > Anything but first 4g is completely unmergable. Afterwards, why is it > > sending single sector values (then 2, then 3) every 1024s, then fill up > > the rest of the those 1024s but going backwards ? > > > > For the record, if I force blkdiscard to use power-of-2 aligned step, it > > works w/o the weird small/backwards approach. > > > > 2) r10, fstrim > > > > While it's working notably faster on empty fs (still very long - nearly > > 1 minute), the splits are really weird sized: 648s + 376s, 952s + 72s > > (smaller ones going backwards as well), so those are not mergable > > either. Lots of full 1024s ones though. > > > > In comparison, fstrim on single partition of the same size takes ~ 1.6s > > with large discards going 1:1 pretty much. > > > > > > 3) r5, blkdiscard > > > > Now the case of raid5 - while the behavior seems cleaner there (no > > unusual splits), the unusually precise 10ms delays between each discard > > completion are the main culprit as far as I can see. While the 4k splits > > (which then get merged back to chunk-sized pieces) take their toll, it's > > a small footprint in comparison. > > > > Anyway, > > Song, do you have some suggestions or comments about those results (or > need more specific tests to do, while I can still do them) ? Hi Michal, I haven't got much time to look into this. It is probably not easy to fix in md layer. How about some workaround like: 1. trim each device; 2. create RAID volume; 3. skip trim at mkfs time (mkfs.xfs -K or equivalent) Thanks, Song