Oh, get_random_*() is really expensive. Thanks for your tips. The boot log on my aarch64 showed bellow told it taked about 0.6 second to fill with disk data. [ 0.172831] DMA: preallocated 256 KiB pool for atomic allocations [ 0.788664] raid6: int64x1 gen() 121 MB/s [ 0.856613] raid6: int64x1 xor() 74 MB/s [ 0.924665] raid6: int64x2 gen() 166 MB/s [ 0.992846] raid6: int64x2 xor() 95 MB/s [ 1.060681] raid6: int64x4 gen() 290 MB/s [ 1.128774] raid6: int64x4 xor() 160 MB/s [ 1.196933] raid6: int64x8 gen() 238 MB/s [ 1.264937] raid6: int64x8 xor() 148 MB/s [ 1.332878] raid6: neonx1 gen() 256 MB/s [ 1.400975] raid6: neonx1 xor() 130 MB/s [ 1.468951] raid6: neonx2 gen() 333 MB/s [ 1.537085] raid6: neonx2 xor() 181 MB/s [ 1.605042] raid6: neonx4 gen() 451 MB/s [ 1.673121] raid6: neonx4 xor() 289 MB/s [ 1.741143] raid6: neonx8 gen() 452 MB/s [ 1.809151] raid6: neonx8 xor() 277 MB/s [ 1.809154] raid6: using algorithm neonx8 gen() 452 MB/s [ 1.809157] raid6: .... xor() 277 MB/s, rmw enabled [ 1.809160] raid6: using intx1 recovery algorithm I replaced get_random_* with a local PRNG based on well-know "linear congruential bit". The patch was like this: +/* use the linear congruential bit. */ +static int32_t get_random_number_by_lcb(void) +{ + static int32_t seed = 1; + int32_t ret = 0; + ret = ((seed * 1103515245) + 12345) & 0x7fffffff; + seed = ret; + return ret; +} /* Try to pick the best algorithm */ /* This code uses the gfmul table as convenient data set to abuse */ @@ -229,8 +238,8 @@ int __init raid6_select_algo(void) for (i = 0; i < disks-2; i++) { dptrs[i] = disk_ptr + PAGE_SIZE*i; - for (j = 0; j < PAGE_SIZE; j++) - get_random_bytes(dptrs[i]+j, 1); + for (j = 0; j < PAGE_SIZE; j = j + 4) + *(int32_t *)(dptrs[i]+j) = get_random_number_by_lcb(); } dptrs[disks-2] = disk_ptr + PAGE_SIZE*(disks-2); The boot log with this patch was showd bellow, it taked about 0.08 second. [ 0.172858] DMA: preallocated 256 KiB pool for atomic allocations [ 0.256673] raid6: int64x1 gen() 121 MB/s [ 0.324484] raid6: int64x1 xor() 73 MB/s [ 0.392606] raid6: int64x2 gen() 166 MB/s [ 0.460309] raid6: int64x2 xor() 92 MB/s [ 0.528368] raid6: int64x4 gen() 290 MB/s [ 0.596401] raid6: int64x4 xor() 156 MB/s [ 0.664601] raid6: int64x8 gen() 238 MB/s [ 0.732609] raid6: int64x8 xor() 148 MB/s [ 0.800523] raid6: neonx1 gen() 256 MB/s [ 0.868730] raid6: neonx1 xor() 129 MB/s [ 0.936741] raid6: neonx2 gen() 334 MB/s [ 1.004717] raid6: neonx2 xor() 202 MB/s [ 1.072692] raid6: neonx4 gen() 451 MB/s [ 1.140763] raid6: neonx4 xor() 260 MB/s [ 1.208842] raid6: neonx8 gen() 452 MB/s [ 1.276887] raid6: neonx8 xor() 277 MB/s [ 1.276890] raid6: using algorithm neonx8 gen() 452 MB/s [ 1.276894] raid6: .... xor() 277 MB/s, rmw enabled [ 1.276897] raid6: using intx1 recovery algorithm [ 1.276941] ACPI: Interpreter disabled. I'm not familiar with spurious D$ conflicts and CPU cache behavior. How do you think this PRNG or anything else I need to do? ------------------ Original ------------------ From: "H. Peter Anvin"<hpa@xxxxxxxxx>; Date: Tue, Aug 23, 2016 11:53 AM To: "liuzhengyuan"<liuzhengyuan@xxxxxxxxxx>; Cc: "shli"<shli@xxxxxxxxxx>; "linux-raid"<linux-raid@xxxxxxxxxxxxxxx>; "fenghua.yu"<fenghua.yu@xxxxxxxxx>; "linux-kernel"<linux-kernel@xxxxxxxxxxxxxxx>; "liuzhengyuang521"<liuzhengyuang521@xxxxxxxxx>; Subject: Re: [PATCH] raid6: fix the input of raid6 algorithm Do you have any idea how long this takes to run? People are already complaining about the boot time penalty. get_random_*() is quite expensive and is overkill... -- Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.��.n��������+%������w��{.n�����{����w��ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f