> On Aug 28, 2019, at 12:29 AM, Guoqing Jiang <jgq516@xxxxxxxxx> wrote: > > The break_stripe_batch_list function is called by handle_stripe and > handle_stripe_clean_event (it is also called by handle_stripe), so > the original caller of break_stripe_batch_list is handle_stripe. > > Since handle_stripe set STRIPE_ACTIVE flag at the beginning, and it is > cleared at the end of handle_stripe, which means break_stripe_batch_list > always triggers the below warning if it is called. > > [7028915.431770] stripe state: 2001 > [7028915.431815] ------------[ cut here ]------------ > [7028915.431828] WARNING: CPU: 18 PID: 29089 at drivers/md/raid5.c:4614 break_stripe_batch_list+0x203/0x240 [raid456] > [...] > [7028915.431879] CPU: 18 PID: 29089 Comm: kworker/u82:5 Tainted: G O 4.14.86-1-storage #4.14.86-1.2~deb9 > [7028915.431881] Hardware name: Supermicro SSG-2028R-ACR24L/X10DRH-iT, BIOS 3.1 06/18/2018 > [7028915.431888] Workqueue: raid5wq raid5_do_work [raid456] > [7028915.431890] task: ffff9ab0ef36d7c0 task.stack: ffffb72926f84000 > [7028915.431896] RIP: 0010:break_stripe_batch_list+0x203/0x240 [raid456] > [7028915.431898] RSP: 0018:ffffb72926f87ba8 EFLAGS: 00010286 > [7028915.431900] RAX: 0000000000000012 RBX: ffff9aaa84a98000 RCX: 0000000000000000 > [7028915.431901] RDX: 0000000000000000 RSI: ffff9ab2bfa15458 RDI: ffff9ab2bfa15458 > [7028915.431902] RBP: ffff9aaa8fb4e900 R08: 0000000000000001 R09: 0000000000002eb4 > [7028915.431903] R10: 00000000ffffffff R11: 0000000000000000 R12: ffff9ab1736f1b00 > [7028915.431904] R13: 0000000000000000 R14: ffff9aaa8fb4e900 R15: 0000000000000001 > [7028915.431906] FS: 0000000000000000(0000) GS:ffff9ab2bfa00000(0000) knlGS:0000000000000000 > [7028915.431907] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [7028915.431908] CR2: 00007ff953b9f5d8 CR3: 0000000bf4009002 CR4: 00000000003606e0 > [7028915.431909] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [7028915.431910] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [7028915.431910] Call Trace: > [7028915.431923] handle_stripe+0x8e7/0x2020 [raid456] > [7028915.431930] ? __wake_up_common_lock+0x89/0xc0 > [7028915.431935] handle_active_stripes.isra.58+0x35f/0x560 [raid456] > [7028915.431939] raid5_do_work+0xc6/0x1f0 [raid456] > > But break_stripe_batch_list is called under conditions: too many failed > devices, write error happened or failure of pdisk/qdisk etc, which means > the warning is happened rarely. Though I still found the same issue was > reported in list [1]. > > So let's remove the checking of STRIPE_ACTIVE inside WARN_ONCE. > > [1]. https://urldefense.proofpoint.com/v2/url?u=https-3A__www.spinics.net_lists_raid_msg62552.html&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=dR8692q0_uaizy0jkrBJQM5k2hfm4CiFxYT8KaysFrg&m=ARYLe8Z4AXE4keb4zF4aJP5fxg0WGULzlDh6cblUN64&s=TLYHQPA7jhD1nhLxgsZkcx6DZT5NgRf7WtTSH4b3xpY&e= > > Signed-off-by: Guoqing Jiang <guoqing.jiang@xxxxxxxxxxxxxxx> > --- > drivers/md/raid5.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > index 88e56ee98976..e3dced8ad1b5 100644 > --- a/drivers/md/raid5.c > +++ b/drivers/md/raid5.c > @@ -4612,8 +4612,7 @@ static void break_stripe_batch_list(struct stripe_head *head_sh, > > list_del_init(&sh->batch_list); > > - WARN_ONCE(sh->state & ((1 << STRIPE_ACTIVE) | > - (1 << STRIPE_SYNCING) | > + WARN_ONCE(sh->state & ((1 << STRIPE_SYNCING) | > (1 << STRIPE_REPLACED) | > (1 << STRIPE_DELAYED) | > (1 << STRIPE_BIT_DELAY) | I read the code again, and now I am not sure whether we are fixing the issue. This WARN_ONCE() does not run for head_sh, which should have STRIPE_ACTIVE. It only runs on other stripes in the batch, which should not have STRIPE_ACTIVE. Does this make sense? Thanks, Song