Re: [PATCH] md: simplify flush request handling

Shaohua Li <shli@xxxxxxxxxx> · Thu, 17 May 2018 09:19:03 -0700

On Thu, May 17, 2018 at 12:56:48PM +0800, Xiao Ni wrote:
> 
> 
> On 05/15/2018 12:49 AM, Shaohua Li wrote:
> > On Fri, May 11, 2018 at 10:46:59AM +0800, Xiao Ni wrote:
> > > Hi Shaohua
> > > 
> > > 
> > > On 05/11/2018 06:23 AM, Shaohua Li wrote:
> > > > From: Shaohua Li <shli@xxxxxx>
> > > > 
> > > > The recent flush request handling seems unncessary complicated. The main
> > > > issue is in rdev_end_flush we can either get rdev of the bio or the
> > > > flush_info, not both, or we need extra memory to for the other. With the
> > > > extra memory, we need reallocate the memory in disk hotadd/remove.
> > > > Actually the original patch forgets one case of add_new_disk for memory
> > > > allocation, and we have kernel crash.
> > > add_new_disk just adds disk to md as a spare disk. After reshape raid
> > > disks update_raid_disks realloc memory. Why is there a kernel crash?
> > > Could you explain more?
> > Not always reshape. It's very easy to reproduce. Just create a linear array,
> > grow up one disk, and run some file io triggering flush.
> Hi Shaohua
> 
> How do I grow up one disk after creating a raid0 array? Something like those
> steps?
> mdadm -CR /dev/md0 -l1 -n2 /dev/sdb1 /dev/sdc1 --size=50M
> mdadm -CR /dev/md1 -l0 /dev/md0 /dev/sdd2
> mdadm /dev/md0 --grow --size=max
> 
> I read the code again and I still can't catch the point. In md_flush_request
> it just sends
> flush bio to md_rdev which is not faulty and the raid_disk is not -1. After
> add_new_disk
> the md_rdev->raid_disk is -1. It can change to the role number only after
> reshaping. So
> for this situation it can't panic.

Not raid0, it's linear array.

> > 
> > > > The idea is always to increase all rdev reference in md_flush_request
> > > > and decrease the references after bio finish. In this way,
> > > > rdev_end_flush doesn't need to know rdev, so we don't need to allocate
> > > > extra memory.
> > > Is there a situation like this? It plugs one disk to the raid after the
> > > flush
> > > bios submitted to underlayer disks. After those flush bios come back there
> > > is one more rdev in the list mddev->disks. If decrements all rdev reference
> > > at one time, it can decrease the rdev reference which doesn't submit flush
> > > bio.  It's the reason I try to allocate memory for bios[0] in flush_info.
> > I think we wait all IO finish before the hot add/remove disks.
> 
> I read the related codes and did some tests. In fact it don't wait IO
> finish. I call raid1 get_unqueued_pending
> in add_new_disk before bind_rdev_to_array(list_add_rcu(&rdev->same_set,
> &mddev->disks); ). it shows
> there are I/O. Should it call mddev_suspend in add_new_disk? If it can wait
> all IO finish, there is no need to
> alloc memory for bios[]

You are right, we didn't wait for IO for hot add for all personalities. Need to
find a better way to solve this issue.

Thanks,
Shaohua
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html