Song Liu kirjoitti 2019-10-25 01:56:
On Thu, Oct 24, 2019 at 12:42 PM Anssi Hannula <anssi.hannula@xxxxxx>
wrote:
Song Liu kirjoitti 2019-10-24 21:50:
> On Sat, Oct 19, 2019 at 2:10 AM Anssi Hannula <anssi.hannula@xxxxxx>
> wrote:
>>
>> Hi all,
>>
>> I'm seeing a reshape issue where the array gets stuck with requests
>> seemingly getting blocked and md0_raid6 process taking 100% CPU
>> whenever
>> I --continue the reshape.
>>
>> From what I can tell, the md0_raid6 process is stuck processing a set
>> of
>> stripes over and over via handle_stripe() without progressing.
>>
>> Log excerpt of one handle_stripe() of an affected stripe with some
>> extra
>> logging is below.
>> The 4600-5200 integers are line numbers for
>> http://onse.fi/files/reshape-infloop-issue/raid5.c .
>
> Maybe add sh->sector to DEBUGPRINT()?
Note that the XX debug printing was guarded by
bool debout = (sh->sector == 198248960) && __ratelimit(&_rsafasfas);
So everything was for sector 198248960 and rate limited every 20sec to
avoid a flood.
> Also, please add more DEBUGPRINT() in the
>
> if (sh->reconstruct_state == reconstruct_state_result) {
>
> case.
OK, added prints there.
Though after logging I noticed that the execution never gets there,
sh->reconstruct_state is always reconstruct_state_idle at that point.
It gets cleared on the "XX too many failed" log message (line 4798).
I guess the failed = 10 is the problem here..
What does /proc/mdstat say?
After --assemble --backup-file=xx, before --grow:
md0 : active raid6 sdac[0] sdf[21] sdh[17] sdj[18] sde[26] sdr[20]
sds[15] sdad[25] sdk[13] sdp[27] sdo[11] sdl[10] sdn[9] sdt[16] md8[28]
sdi[22] sdae[23] sdaf[24] sdm[3] sdg[2] sdq[1]
74232661248 blocks super 1.1 level 6, 64k chunk, algorithm 2
[20/20] [UUUUUUUUUUUUUUUUUUUU]
[===================>.] reshape = 97.5% (4024886912/4124036736)
finish=10844512.0min speed=0K/sec
bitmap: 5/31 pages [20KB], 65536KB chunk
After --grow --continue --backup-file=xx (i.e. during the
handle_stripe() loop):
md0 : active raid6 sdac[0] sdf[21] sdh[17] sdj[18] sde[26] sdr[20]
sds[15] sdad[25] sdk[13] sdp[27] sdo[11] sdl[10] sdn[9] sdt[16] md8[28]
sdi[22] sdae[23] sdaf[24] sdm[3] sdg[2] sdq[1]
74232661248 blocks super 1.1 level 6, 64k chunk, algorithm 2
[20/20] [UUUUUUUUUUUUUUUUUUUU]
[===================>.] reshape = 97.5% (4024917256/4124036736)
finish=7674.2min speed=215K/sec
bitmap: 5/31 pages [20KB], 65536KB chunk
After a reboot due to the stuck array the reshape progress gets reset
back to 4024886912.
--
Anssi Hannula