Re: raid456's devices_handle_discard_safely is unusably slow

Michal Soltys <soltys@xxxxxxxx> · Mon, 28 Jan 2019 13:07:52 +0100

On 1/25/19 9:32 PM, Song Liu wrote:
On Fri, Jan 25, 2019 at 10:10 AM Michal Soltys <soltys@xxxxxxxx> wrote:

I guess this is because raid5 issues trim at a smaller granularity (4kB).
Could you please confirm this theory? (by running iostat, or blktrace
during mkfs)

I also did the blktrace of just simple blkdiscard on the small raid 
(4x32g r5, also ~11 minutes), binary data is around 7gb, parsed data is 
around 10gb.

I attached a fragment of the output + summary at:

https://drive.google.com/open?id=12uku_bTrw2VLOVXgSDN4vGJYS2iNzO5y

But it seems (with my untrained eyes) it's roughly as you say - lots of 
small 4k discard requests that get merged into larger 512k chunks 
submitted to devices (each causing up-to 10ms delays on same device 
between completions). E.g.:

  8,0    1   269796    53.012284482  7185  A   D 5021488 + 8 <- (9,10) 0
  8,0    1   269797    53.012284580  7185  A   D 5023536 + 8 <- (8,1) 
5021488
  8,0    1   269798    53.012284685  7185  Q   D 5023536 + 8 [md10_raid5]
  8,0    1   269799    53.012284833  7185  M   D 5023536 + 8 [md10_raid5]
  8,0    1   269800    53.012286965  7185 UT   N [md10_raid5] 1
  8,0    1   269801    53.012287212  7185  I   D 5022520 + 1024 
[md10_raid5]
  8,0    1   269802    53.012295378   405  D   D 5022520 + 1024 
[kworker/1:1H]
  8,0    1   269803    53.021024791     0  C   D 5019448 + 1024 [0]
  8,0    1   269804    53.031027682     0  C   D 5020472 + 1024 [0]
  8,0    1   269805    53.040992885     0  C   D 5021496 + 1024 [0]
  8,0    1   269806    53.041155059  7185  A   D 5021496 + 8 <- (9,10) 0
  8,0    1   269807    53.041155208  7185  A   D 5023544 + 8 <- (8,1) 
5021496
  8,0    1   269808    53.041155379  7185  Q   D 5023544 + 8 [md10_raid5]
  8,0    1   269809    53.041155614  7185  G   D 5023544 + 8 [md10_raid5]