Re: Unable to handle kernel NULL pointer dereference in super_written

Shaohua Li <shlikernel@xxxxxxxxx> · Wed, 30 Mar 2016 10:27:19 -0700

On 03/30/2016 12:44 AM, Xiao Ni wrote:

----- Original Message -----
From: "Shaohua Li" <shli@xxxxxxxxxx>
To: "Xiao Ni" <xni@xxxxxxxxxx>
Cc: "linux-raid" <linux-raid@xxxxxxxxxxxxxxx>, "Jes Sorensen" <Jes.Sorensen@xxxxxxxxxx>, "Neil Brown" <neilb@xxxxxxx>
Sent: Wednesday, March 30, 2016 5:37:31 AM
Subject: Re: Unable to handle kernel NULL pointer dereference in super_written

On Tue, Mar 29, 2016 at 08:22:00AM -0400, Xiao Ni wrote:
Hi all

I encountered one NULL pointer dereference problem.

The environment：
latest linux-stable and mdadm codes
aarch64 platform
the md device is created with loop devices

It's a test case to check date integrity. I added the test script as the
attachment.
Could you please try this patch:
Thanks for the patch, I'm running test and will give the result. It need to run
more than 300 iterations to reproduce this.

 From b86d9e1724184c79ad1ea63901aec802492b861c Mon Sep 17 00:00:00 2001
Message-Id:
<b86d9e1724184c79ad1ea63901aec802492b861c.1459285706.git.shli@xxxxxx>
From: Shaohua Li <shli@xxxxxx>
Date: Tue, 29 Mar 2016 14:00:19 -0700
Subject: [PATCH] MD: add rdev reference for super write

md_super_write() and corresponding md_super_wait() generally are called
with reconfig_mutex locked, which prevents disk disappears. There is one
case this rule is broken. write_sb_page of bitmap.c doesn't hold the
mutex. next_active_rdev does increase rdev reference, but it decreases
the reference too early (eg, before IO finish). disk can disappear at
the window. We unconditionally increase rdev reference in
md_super_write() to avoid the race.
In the path hot_remove_disk, the write_sb_page is protected by reconfig_mutex.
It shouldn't submit bio to the leg which is already set FAULTY. Could you give
an example to show how the buy happen?

Not sure if I understand your question correctly, but I try to answer. 
When a disk is reported faulty with md_error we don't immediately remove 
the disk as there is risk for example some IO is running in the rdev. We 
increase rdev reference in every IO and decrease the reference after IO 
finishes. You can find this in raid5.c for example. We only delete the 
rdev after the reference is 0, please see remove_and_add_spares(). So 
it's possible you will find disk with FAULTY set, but it's still in rdev 
list.

Thanks,
Shaohua
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html