On Thu, Oct 24, 2019 at 12:42 PM Anssi Hannula <anssi.hannula@xxxxxx> wrote: > > Song Liu kirjoitti 2019-10-24 21:50: > > Sorry for delayed reply. > > No problem :) > > > On Sat, Oct 19, 2019 at 2:10 AM Anssi Hannula <anssi.hannula@xxxxxx> > > wrote: > >> > >> Hi all, > >> > >> I'm seeing a reshape issue where the array gets stuck with requests > >> seemingly getting blocked and md0_raid6 process taking 100% CPU > >> whenever > >> I --continue the reshape. > >> > >> From what I can tell, the md0_raid6 process is stuck processing a set > >> of > >> stripes over and over via handle_stripe() without progressing. > >> > >> Log excerpt of one handle_stripe() of an affected stripe with some > >> extra > >> logging is below. > >> The 4600-5200 integers are line numbers for > >> http://onse.fi/files/reshape-infloop-issue/raid5.c . > > > > Maybe add sh->sector to DEBUGPRINT()? > > Note that the XX debug printing was guarded by > > bool debout = (sh->sector == 198248960) && __ratelimit(&_rsafasfas); > > So everything was for sector 198248960 and rate limited every 20sec to > avoid a flood. > > > Also, please add more DEBUGPRINT() in the > > > > if (sh->reconstruct_state == reconstruct_state_result) { > > > > case. > > OK, added prints there. > > Though after logging I noticed that the execution never gets there, > sh->reconstruct_state is always reconstruct_state_idle at that point. > It gets cleared on the "XX too many failed" log message (line 4798). > I guess the failed = 10 is the problem here.. What does /proc/mdstat say? Thanks, Song