RE: [md PATCH 2/5] md: Enable reshape for external metadata

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: Neil Brown [mailto:neilb@xxxxxxx]
> Sent: Thursday, June 17, 2010 8:12 AM
> To: Kwolek, Adam
> Cc: linux-raid@xxxxxxxxxxxxxxx; Williams, Dan J; Ciechanowski, Ed
> Subject: Re: [md PATCH 2/5] md: Enable reshape for external metadata
> 
> On Wed, 16 Jun 2010 15:52:06 +0100
> "Kwolek, Adam" <adam.kwolek@xxxxxxxxx> wrote:
> 
> > > -----Original Message-----
> > > From: Neil Brown [mailto:neilb@xxxxxxx]
> > > Sent: Wednesday, June 16, 2010 6:54 AM
> > > To: Kwolek, Adam
> > > Cc: linux-raid@xxxxxxxxxxxxxxx; Williams, Dan J; Ciechanowski, Ed
> > > Subject: Re: [md PATCH 2/5] md: Enable reshape for external
> metadata
> > >
> > > On Wed, 9 Jun 2010 15:21:05 +0100
> > > "Kwolek, Adam" <adam.kwolek@xxxxxxxxx> wrote:
> > >
> > > > (md: Online Capacity Expansion for IMSM)
> > > > Reshape can't go forward for external metadatas due to fact, that
> > > internal md flags are updated during native meta writing. For
> external
> > > metadatas
> > > > md_update_sb() is not called (reshape process is blocked).
> > > > To take carry about md flags in external metadata array case and
> > > allow reshape to roll over, md_update_sb() is called in similar way
> to
> > > native metadata. The difference is that metadata is not stored to
> disks
> > > by md, but externally by mdmon.
> > >
> > > I agree there is a problem here but I think you are approaching it
> the
> > > wrong
> > > way.  We need to make sure the problem flag doesn't get set when
> > > external
> > > metadata is used.
> > >
> > > I found something similar (maybe the same thing) when writing the
> dm-
> > > raid456
> > > module.  Does that patch:
> > >
> > >
> http://neil.brown.name/git?p=md;a=commitdiff;h=3b930b37e50702f97f8fad6d
> > > 7f5ee6d3f268394e
> > >
> > > solve the problem for you?
> > >
> > > Thanks,
> > > NeilBrown
> >
> > It almost solves problem.
> > To make it solved the following change I have to add:
> >
> > md/md.c |    5 +++--
> >  1 files changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/md/md.c b/md/md.c
> > index e9b5d67..01e88cd 100644
> > --- a/md/md.c
> > +++ b/md/md.c
> > @@ -6437,7 +6437,8 @@ int md_allow_write(mddev_t *mddev)
> >  	spin_lock_irq(&mddev->write_lock);
> >  	if (mddev->in_sync) {
> >  		mddev->in_sync = 0;
> > -		set_bit(MD_CHANGE_CLEAN, &mddev->flags);
> > +		if (mddev->persistent)
> > +			set_bit(MD_CHANGE_CLEAN, &mddev->flags);
> >  		if (mddev->safemode_delay &&
> >  		    mddev->safemode == 0)
> >  			mddev->safemode = 1;
> >
> > This closes MD_CHANGE_CLEAN flag management for me.
> > What do you think about this?
> 
> 
> >
> > Another thing is waiting during reshape for metadata update on
> MD_CHANGE_DEVS flag.
> > To roll reshape I've added the following code (instead calling
> md_ubdate_sb()):
> 
> Yes, there is a real issue there...
> 
> I don't think we ever need the kernel to wait for an external metadata
> handler
> to respond to device changes (apart from failure which is handled
> separately).
> So maybe the best thing is to guard all settings of MD_CHANGE_DEVS with
> if (mddev->persistent)
> 
> I think that would be best, but I've make a note to review that later.
> 
> Thanks,
> NeilBrown
> 
> 
> >
> >
> > diff --git a/md/md.c b/md/md.c
> > index 01e88cd..539b323 100644
> > --- a/md/md.c
> > +++ b/md/md.c
> > @@ -6896,8 +6896,17 @@ void md_check_recovery(mddev_t *mddev)
> >  		(mddev->external == 0 && mddev->safemode == 1) ||
> >  		(mddev->safemode == 2 && ! atomic_read(&mddev-
> >writes_pending)
> >  		 && !mddev->in_sync && mddev->recovery_cp == MaxSector)
> > -		))
> > +		)) {
> > +			if ((mddev->external) && (mddev->flags)) {
> > +				/* FIXME:
> > +				 * for external metadata checkpointing purposes
> > +				 * put communication with userspace here
> > +				 */
> > +				clear_bit(MD_CHANGE_DEVS, &mddev->flags);
> > +				wake_up(&mddev->sb_wait);
> > +			}
> >  			return;
> > +		}
> >
> >  	if (mddev_trylock(mddev)) {
> >  		int spares = 0;
> >
> >
> > This change we can reuse for checkpointing purposes.
> > External metadata updates can be triggered in the same moments as
> native one (reuse internal md communication mechanisms).
> >
> > What is your opinion about this?
> >
> > BR
> > Adam
> >


> No, that would be wrong.
> 
> If the array is 'clean' and a write happens, the array_state changes to
> "write_pending" and the write blocks until mdmon updates the metadata
> and
> then writes "active" to "array_state".
> 
> Setting MD_CHANGE_CLEAN is important for this handshake to work
> properly.


The problem is during setting new raid_disks value for external meta when md doesn't update metadata.
When in resize_stripe() md_allow_write() is called it returns an error (due to this handshake mechanism), so user space
has no opportunity to answer in such case (before function call result is checked in md) even for notification that is present there. 
In such case, I can clear mddev->in_sync, before md_allow_write(). This leaves md_allow_write()
unattached (for use in other places). I think the best place for this is chechk_reshape(). 
This allows to set new raid_disks value.

diff --git a/md/raid5.c b/md/raid5.c
index cbd5c14..f5c42c9 100644
--- a/md/raid5.c
+++ b/md/raid5.c
@@ -5404,6 +5403,10 @@ static int check_reshape(mddev_t *mddev)
 	if (!check_stripe_cache(mddev))
 		return -ENOSPC;
 
+	if (mddev->external && mddev->delta_disks > 0)
+		mddev->in_sync = 0;
+
+
 	return resize_stripes(conf, conf->raid_disks + mddev->delta_disks);
 }


In fact in_sync == 0 at this moment shows real situation, not just trick.

This is my proposition as I see no possibility to interact with user space during this call 
to manage MD_CHANGE_CLEAN flag.

Can you have a look at this problem?

Thanks 
Adam


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux