Re: raid456's devices_handle_discard_safely is unusably slow

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 19/01/25 21:32, Song Liu wrote:
> On Fri, Jan 25, 2019 at 10:10 AM Michal Soltys <soltys@xxxxxxxx> wrote:
>>
>> I've been testing simple raid5 consisting of 4 ssds (all handling
>> post-trim reads as 0s safely, as reported by hdaprm -I: "Deterministic
>> read ZEROs after TRIM") - with 4.19.13 and 4.20.0 as of now (so decently
>> fresh kernels).
>>
>> The drives are itself 2T (1.9T after overprovisioning with hpa), so
>> initial tests were done with 5.7T array. but that was completely
>> unusable in practice, the moment discards were attempted.
>>
>> Then I created tiny array from 32gb long chunks (pre-discarded manually):
>>
>> # echo Y >/sys/module/raid456/parameters/devices_handle_discard_safely
>> # mdadm -C /dev/md/test32 -e1.1 -l5 -n4 -z32G --assume-clean -n test32
>> /dev/sd{a,b,c,d}1
>>
>> Then I did simple mkfs.ext4 /dev/md/test32
>>
>> It took over 11 minutes to finish (of which nearly everything was
>> accounted for "Discarding device blocks" phase). Extrapolating to
>> full size array that would be around 7 hours.
>>
>> The drives alone (full 1.9T size) handle mkfs.ext4 with full discard
>> phase in 1 minute or so. Quick test with raid10 works fine as well
>> (albeit a bit slower than I expected - a bit under 6 minutes; full
>> fstrim 37min - but that's perfectly fine for daily/weekly fstrim).
>>
>> Is raid5's trim implementation expected to be this slow ?
> 
> I guess this is because raid5 issues trim at a smaller granularity (4kB).
> Could you please confirm this theory? (by running iostat, or blktrace
> during mkfs)
> 

Quick iostat -xm 1 shows:

- mkfs.ext4 on raid5 (small one 4x32g, that took 11min)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    1.12   11.62    0.00   87.25

Device            r/s     w/s     rMB/s     wMB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
..........
sda              0.00    0.00      0.00      0.00     0.00 12604.00   0.00 100.00    0.00    0.00   4.40     0.00     0.00   0.00 100.00
sdc              0.00    0.00      0.00      0.00     0.00 12604.00   0.00 100.00    0.00    0.00   4.80     0.00     0.00   0.00  99.20
sdb              0.00    0.00      0.00      0.00     0.00 12604.00   0.00 100.00    0.00    0.00   4.65     0.00     0.00   0.00  99.20
sdd              0.00    0.00      0.00      0.00     0.00 12595.00   0.00 100.00    0.00    0.00   4.29     0.00     0.00   0.00  99.20

- comparing that to mkfs.ext4 on raid10 (4x1.9t, ~6min) array:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.13    0.00    0.75   16.92    0.00   82.21

Device            r/s     w/s     rMB/s     wMB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda              0.00    6.00      0.00      0.02     0.00 11419.00   0.00  99.95    0.00    0.67   4.18     0.00     4.00 158.67  95.20
sdc              0.00    6.00      0.00      0.02     0.00 11421.00   0.00  99.95    0.00    0.67   4.05     0.00     4.00 156.67  94.00
sdb              0.00    6.00      0.00      0.02     0.00 11420.00   0.00  99.95    0.00    0.83   4.04     0.00     4.00 156.00  93.60
sdd              0.00    6.00      0.00      0.02     0.00 11423.00   0.00  99.95    0.00    0.67   3.72     0.00     4.00 150.67  90.40




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux