RE: [PATCH 08/12] FIX: use md position to reshape restart

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: linux-raid-owner@xxxxxxxxxxxxxxx [mailto:linux-raid-
> owner@xxxxxxxxxxxxxxx] On Behalf Of Kwolek, Adam
> Sent: Thursday, February 09, 2012 11:24
> To: NeilBrown
> Cc: linux-raid@xxxxxxxxxxxxxxx; Ciechanowski, Ed; Labun, Marcin; Williams, Dan
> J; Dorau, Lukasz
> Subject: RE: [PATCH 08/12] FIX: use md position to reshape restart
> 
> 
> 
> > -----Original Message-----
> > From: NeilBrown [mailto:neilb@xxxxxxx]
> > Sent: Thursday, February 09, 2012 11:20
> > To: Kwolek, Adam
> > Cc: linux-raid@xxxxxxxxxxxxxxx; Ciechanowski, Ed; Labun, Marcin;
> > Williams, Dan J; Dorau, Lukasz
> > Subject: Re: [PATCH 08/12] FIX: use md position to reshape restart
> >
> > On Thu, 9 Feb 2012 09:49:01 +0000 "Kwolek, Adam"
> > <adam.kwolek@xxxxxxxxx>
> > wrote:
> >
> > >
> > >
> > > > -----Original Message-----
> > > > From: NeilBrown [mailto:neilb@xxxxxxx]
> > > > Sent: Thursday, February 09, 2012 10:19
> > > > To: Kwolek, Adam
> > > > Cc: linux-raid@xxxxxxxxxxxxxxx; Ciechanowski, Ed; Labun, Marcin;
> > > > Williams, Dan J; Dorau, Lukasz
> > > > Subject: Re: [PATCH 08/12] FIX: use md position to reshape restart
> > > >
> > > > On Thu, 9 Feb 2012 08:16:16 +0000 "Kwolek, Adam"
> > > > <adam.kwolek@xxxxxxxxx>
> > > > wrote:
> > > >
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: NeilBrown [mailto:neilb@xxxxxxx]
> > > > > > Sent: Thursday, February 09, 2012 02:36
> > > > > > To: Kwolek, Adam
> > > > > > Cc: linux-raid@xxxxxxxxxxxxxxx; Ciechanowski, Ed; Labun,
> > > > > > Marcin; Williams, Dan J; Dorau, Lukasz
> > > > > > Subject: Re: [PATCH 08/12] FIX: use md position to reshape
> > > > > > restart
> > > > > >
> > > > > > On Tue, 07 Feb 2012 15:03:59 +0100 Adam Kwolek
> > > > > > <adam.kwolek@xxxxxxxxx>
> > > > > > wrote:
> > > > > >
> > > > > > > When reshape is broken, it can occur that metadata is not
> > > > > > > saved
> > properly.
> > > > > > > This can cause that reshape process is farther in md than
> > > > > > > metadata
> > states.
> > > > > > >
> > > > > > > On reshape restart use md position as start position, if it
> > > > > > > is farther than position specified in metadata. Opposite
> > > > > > > situation treat as
> > > > error.
> > > > > > >
> > > > > > > Signed-off-by: Adam Kwolek <adam.kwolek@xxxxxxxxx>
> > > > > > > ---
> > > > > > >
> > > > > > >  Grow.c |   86
> +++++++++++++++++++++++++++++++++++++++++++++--
> > ----
> > > > -----
> > > > > > --------
> > > > > > >  1 files changed, 60 insertions(+), 26 deletions(-)
> > > > > > >
> > > > > > > diff --git a/Grow.c b/Grow.c index 70bdee1..0668a16 100644
> > > > > > > --- a/Grow.c
> > > > > > > +++ b/Grow.c
> > > > > > > @@ -1862,6 +1862,55 @@ release:
> > > > > > >  	return rv;
> > > > > > >  }
> > > > > > >
> > > > > > > +/* verify_reshape_position()
> > > > > > > + *	Function checks if reshape position in metadata is not farther
> > > > > > > + *	than position in md.
> > > > > > > + * Return value:
> > > > > > > + *	 0 : not valid sysfs entry
> > > > > > > + *		it can be caused by not started reshape, it should be
> > > > started
> > > > > > > + *		by reshape array or raid0 array is before takeover
> > > > > > > + *	-1 :	error, reshape position is obviously wrong
> > > > > > > + *	 1 :	success, reshape progress correct or updated
> > > > > > > +*/
> > > > > > > +static int verify_reshape_position(struct mdinfo *info, int level) {
> > > > > > > +	int ret_val = 0;
> > > > > > > +	char buf[40];
> > > > > > > +
> > > > > > > +	/* read sync_max, failure can mean raid0 array */
> > > > > > > +	if (sysfs_get_str(info, NULL, "sync_max", buf, 40) > 0) {
> > > > > > > +		char *ep;
> > > > > > > +		unsigned long long position = strtoull(buf, &ep, 0);
> > > > > > > +
> > > > > > > +		dprintf(Name": Read sync_max sysfs entry is: %s\n",
> > buf);
> > > > > > > +		if (!(ep == buf || (*ep != 0 && *ep != '\n' && *ep != '
> > > > > > > +')))
> > {
> > > > > > > +			position *= get_data_disks(level,
> > > > > > > +						   info->new_layout,
> > > > > > > +						   info-
> > >array.raid_disks);
> > > > > > > +			if (info->reshape_progress < position) {
> > > > > > > +				dprintf("Corrected reshape progress
> > (%llu) to "
> > > > > > > +					"md position (%llu)\n",
> > > > > > > +					info->reshape_progress,
> > position);
> > > > > > > +				info->reshape_progress = position;
> > > > > > > +				ret_val = 1;
> > > > > > > +			} else if (info->reshape_progress > position) {
> > > > > > > +				fprintf(stderr, Name ": Fatal error:
> > array "
> > > > > > > +					"reshape was not properly
> > frozen "
> > > > > > > +					"(expected reshape position is
> > %llu, "
> > > > > > > +					"but reshape progress is
> > %llu.\n",
> > > > > > > +					position, info-
> > >reshape_progress);
> > > > > > > +				ret_val = -1;
> > > > > > > +			} else {
> > > > > > > +				dprintf("Reshape position in md and
> > metadata "
> > > > > > > +					"are the same;");
> > > > > > > +				ret_val = 1;
> > > > > > > +			}
> > > > > > > +		}
> > > > > > > +	}
> > > > > > > +
> > > > > > > +	return ret_val;
> > > > > > > +}
> > > > > > > +
> > > > > > >  static int reshape_array(char *container, int fd, char *devname,
> > > > > > >  			 struct supertype *st, struct mdinfo *info,
> > > > > > >  			 int force, struct mddev_dev *devlist, @@ -
> > 2251,9
> > > > > > +2300,16 @@
> > > > > > > started:
> > > > > > >
> > > > > > >  	sra->new_chunk = info->new_chunk;
> > > > > > >
> > > > > > > -	if (restart)
> > > > > > > +	if (restart) {
> > > > > > > +		/* for external metadata checkpoint saved by mdmon
> > can be
> > > > > > lost
> > > > > > > +		 * or missed /due to e.g. crash/. Check if md is not
> > during
> > > > > > > +		 * restart farther than metadata points to.
> > > > > > > +		 * If so, this means metadata information is obsolete.
> > > > > > > +		 */
> > > > > > > +		if (st->ss->external)
> > > > > > > +			verify_reshape_position(info, reshape.level);
> > > > > > >  		sra->reshape_progress = info->reshape_progress;
> > > > > > > -	else {
> > > > > > > +	} else {
> > > > > > >  		sra->reshape_progress = 0;
> > > > > > >  		if (reshape.after.data_disks <
> > reshape.before.data_disks)
> > > > > > >  			/* start from the end of the new array */ @@ -
> > 3765,8
> > > > > > +3821,6 @@
> > > > > > > int Grow_continue_command(char *devname, int fd,
> > > > > > >  	char buf[40];
> > > > > > >  	int cfd = -1;
> > > > > > >  	int fd2 = -1;
> > > > > > > -	char *ep;
> > > > > > > -	unsigned long long position;
> > > > > > >
> > > > > > >  	dprintf("Grow continue from command line called for %s\n",
> > > > > > >  		devname);
> > > > > > > @@ -3894,28 +3948,8 @@ int Grow_continue_command(char
> > > > > > > *devname, int
> > > > > > fd,
> > > > > > >  	/* verify that array under reshape is started from
> > > > > > >  	 * correct position
> > > > > > >  	 */
> > > > > > > -	ret_val = sysfs_get_str(content, NULL, "sync_max", buf, 40);
> > > > > > > -	if (ret_val <= 0) {
> > > > > >                     ^^^
> > > > > >
> > > > > > This test is '<=' however ...
> > > > > >
> > > > > >
> > > > > > > -		fprintf(stderr, Name
> > > > > > > -			": cannot open verify reshape progress for %s
> > (%i)\n",
> > > > > > > -			content->sys_name, ret_val);
> > > > > > > -		ret_val = 1;
> > > > > > > -		goto Grow_continue_command_exit;
> > > > > > > -	}
> > > > > > > -	dprintf(Name ": Read sync_max sysfs entry is: %s\n", buf);
> > > > > > > -	position = strtoull(buf, &ep, 0);
> > > > > > > -	if (ep == buf || (*ep != 0 && *ep != '\n' && *ep != ' ')) {
> > > > > > > -		fprintf(stderr, Name ": Fatal error: array reshape was"
> > > > > > > -			" not properly frozen\n");
> > > > > > > -		ret_val = 1;
> > > > > > > -		goto Grow_continue_command_exit;
> > > > > > > -	}
> > > > > > > -	position *= get_data_disks(map_name(pers, mdstat->level),
> > > > > > > -				   content->new_layout,
> > > > > > > -				   content->array.raid_disks);
> > > > > > > -	if (position != content->reshape_progress) {
> > > > > > > -		fprintf(stderr, Name ": Fatal error: array reshape was"
> > > > > > > -			" not properly frozen.\n");
> > > > > > > +	if (verify_reshape_position(content,
> > > > > > > +				    map_name(pers, mdstat->level)) < 0)
> > {
> > > > > >
> > > > > > ^^
> > > > > >
> > > > > > this one is just '<'.  However it looks like it should be the same.
> > > > > >
> > > > > > Was there a reason for the change, or was it an oversight?
> > > > > >
> > > > > > Thanks,
> > > > > > NeilBrown
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > >  		ret_val = 1;
> > > > > > >  		goto Grow_continue_command_exit;
> > > > > > >  	}
> > > > >
> > > > >
> > > > > Removed code is put in to function verify_reshape_position() for
> > > > > use in current location and reshape_array(), as calculated new
> > > > > position in
> > > > Grow_continue_command() cannot be passed to reshape_array(), it
> > > > has to be calculated again.
> > > > >
> > > > > This is what you are mention above?
> > > >
> > > > Not exactly.
> > > >
> > > > The original code would print an error if
> > > >       sysfs_get_str(content, NULL, "sync_max", buf, 40); returns 0 or
> negative.
> > > > The new code will not print an error if that returns zero.
> > > >
> > > > i.e. you changed a '<=' into a '<'.  I wondered if there was some
> > > > reason for that - something that I was missing.
> > > >
> > > > NeilBrown
> > >
> > > Yes, you are right. I've missed '0' case during code reorganization.
> > > You would fix this or you want me to post additional patch?
> >
> > Thanks.
> > I already fixed it - I've just pushed my current tree with all your patches out.
> >
> > Thanks,
> > NeilBrown
> 
> Maybe this is better.
> During this quick update, I've removed by mistake 'ret_val' variable initialization
> in this patch also, this would be not good ;).
> 
> BR
> Adam


I've looked at changes you pushed and this is not exactly the same I've intended to be /look my quick patch with ret_val variable initialization caution/.
Reading 0 length reshape position is an error, absence of sysfs entry is not always an error /raid0 case/.

BR
Adam

> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of
> a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at
> http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux