Re: [PATCH] md: simplify flush request handling

Shaohua Li <shli@xxxxxxxxxx> · Thu, 17 May 2018 11:08:46 -0700

On Thu, May 17, 2018 at 09:19:03AM -0700, Shaohua Li wrote:
> On Thu, May 17, 2018 at 12:56:48PM +0800, Xiao Ni wrote:
> > 
> > 
> > On 05/15/2018 12:49 AM, Shaohua Li wrote:
> > > On Fri, May 11, 2018 at 10:46:59AM +0800, Xiao Ni wrote:
> > > > Hi Shaohua
> > > > 
> > > > 
> > > > On 05/11/2018 06:23 AM, Shaohua Li wrote:
> > > > > From: Shaohua Li <shli@xxxxxx>
> > > > > 
> > > > > The recent flush request handling seems unncessary complicated. The main
> > > > > issue is in rdev_end_flush we can either get rdev of the bio or the
> > > > > flush_info, not both, or we need extra memory to for the other. With the
> > > > > extra memory, we need reallocate the memory in disk hotadd/remove.
> > > > > Actually the original patch forgets one case of add_new_disk for memory
> > > > > allocation, and we have kernel crash.
> > > > add_new_disk just adds disk to md as a spare disk. After reshape raid
> > > > disks update_raid_disks realloc memory. Why is there a kernel crash?
> > > > Could you explain more?
> > > Not always reshape. It's very easy to reproduce. Just create a linear array,
> > > grow up one disk, and run some file io triggering flush.
> > Hi Shaohua
> > 
> > How do I grow up one disk after creating a raid0 array? Something like those
> > steps?
> > mdadm -CR /dev/md0 -l1 -n2 /dev/sdb1 /dev/sdc1 --size=50M
> > mdadm -CR /dev/md1 -l0 /dev/md0 /dev/sdd2
> > mdadm /dev/md0 --grow --size=max
> > 
> > I read the code again and I still can't catch the point. In md_flush_request
> > it just sends
> > flush bio to md_rdev which is not faulty and the raid_disk is not -1. After
> > add_new_disk
> > the md_rdev->raid_disk is -1. It can change to the role number only after
> > reshaping. So
> > for this situation it can't panic.
> 
> Not raid0, it's linear array.
>  
> > > 
> > > > > The idea is always to increase all rdev reference in md_flush_request
> > > > > and decrease the references after bio finish. In this way,
> > > > > rdev_end_flush doesn't need to know rdev, so we don't need to allocate
> > > > > extra memory.
> > > > Is there a situation like this? It plugs one disk to the raid after the
> > > > flush
> > > > bios submitted to underlayer disks. After those flush bios come back there
> > > > is one more rdev in the list mddev->disks. If decrements all rdev reference
> > > > at one time, it can decrease the rdev reference which doesn't submit flush
> > > > bio.  It's the reason I try to allocate memory for bios[0] in flush_info.
> > > I think we wait all IO finish before the hot add/remove disks.
> > 
> > I read the related codes and did some tests. In fact it don't wait IO
> > finish. I call raid1 get_unqueued_pending
> > in add_new_disk before bind_rdev_to_array(list_add_rcu(&rdev->same_set,
> > &mddev->disks); ). it shows
> > there are I/O. Should it call mddev_suspend in add_new_disk? If it can wait
> > all IO finish, there is no need to
> > alloc memory for bios[]
> 
> You are right, we didn't wait for IO for hot add for all personalities. Need to
> find a better way to solve this issue.

Hmm, seems no better solution. Now I'm in favor of your old verion (with mempool).
Could you please repost that one and I'll merge it.

Thanks,
Shaohua
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html