Re: [PATCH mdadm] mdadm: Don't open md device for CREATE and ASSEMBLE

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 23 Aug 2022 09:49:11 -0400
Jes Sorensen <jes@xxxxxxxxxxxxxxxxxx> wrote:

> On 7/20/22 04:20, Mariusz Tkaczyk wrote:
> > On Tue, 19 Jul 2022 10:43:06 -0600
> > Logan Gunthorpe <logang@xxxxxxxxxxxx> wrote:
> >   
> >> On 2022-07-19 05:27, Mariusz Tkaczyk wrote:  
> >>> On Fri, 15 Jul 2022 10:20:26 +0800
> >>> Guoqing Jiang <guoqing.jiang@xxxxxxxxx> wrote:    
> >>>> On 7/15/22 6:37 AM, Logan Gunthorpe wrote:    
> >>>>> To fix this, don't bother trying to open the md device for CREATE and
> >>>>> ASSEMBLE commands, as the file descriptor will never be used anyway
> >>>>> even if it is successfully openned.    
> >>> Hi All,
> >>>
> >>> This is not a fix, it just reduces race probability because file
> >>> descriptor will be opened later.    
> >>
> >> That's not correct. The later "open" call actually will use the new_array
> >> parameter which will wait for the workqueue before creating a new array
> >> device, so the old one is properly cleaned up and there is no longer
> >> a race condition with this patch. If new_array doesn't exist and it falls
> >> back to a regular open, then it will still do the right thing and open the
> >> device with create_on_open.  
> > 
> > Array is created by /sys/module/md/parameters/new_array if chosen_name has
> > certain form. For IMSM, when we are creating arrays using "/dev/md/name" or
> > "name" only create_on_open is used (if no "names=yes" in config). I
> > understand that it works with tests but I don't feel that it is complete
> > yet. Could you how it behaves when we use "whatever"? 
> > 
> > #mdadm -CR whatever -l0 -n2 /dev/nvme[01]n1
> > 
> > Please do not use --name= parameter.
> > 
> > I want to disable create_on_open and always use new_array in the future,
> > without fallback to create_on_open possibility. So I would like to have
> > solution which is not relying on it.  
> >>  
> >>> I tried to resolve it in the past by adding completion to md driver and
> >>> force mdadm --stop command to wait for sysfs clean up but I have never
> >>> finished it. IMO it is a better way, wait for device to be fully removed
> >>> by MD during stop. Thoughts?    
> >>
> >> I don't think that would work very well. Userspace would end up blocking
> >> on --stop indefinitely if there are any references delaying cleanup to
> >> the device. That's not very user friendly. Given that opening the md
> >> device has side-effects, I think we should avoid opening when we
> >> should know that we are about to try to create a new device.  
> > 
> > Got it, thanks!
> > 
> > Hmm, so maybe the existing MD device should be marked as "in the middle of
> > removal" somehow to gives mdadm possibility to detect that. If we are using
> > node as name "/dev/mdX" then we will need to throw error, but when node
> > needs to be determined by find_free_devnm() then it will simply skip this
> > one and gives next one. But it will require changes in kernel probably.
> >   
> >>  
> >>> One objection I have here is that error handling is changed, so it could
> >>> be harmful change for users.     
> >>
> >> Hmm, yes seems like I was a bit sloppy here. However, it still seems
> >> possible to fix this up by not pre-opening the device. The only use for
> >> the mdfd in CREATE, ASSEMBLE and BUILD is to get the minor number if
> >> ident.super_minor == -2 and check if an existing specified device is an md 
> >> device (which may be redundant). We could replace this with a stat() call
> >> to avoid opening the device. What about the patch at the end of this
> >> email?  
> > 
> > LGTM, I put small comment. But as I said before, probably it don't fix all
> > creation cases.  
> 
> Hi Mariusz,
> 
> Just to recap on this, do you support applying this patch as is, or
> should we wait for the longer term fix you were mentioning?
> 
Hi Jes,
This patch looks good. Please apply next one version:
https://lore.kernel.org/linux-raid/20220727215246.121365-3-logang@xxxxxxxxxxxx/

Thanks,
Mariusz



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux