Re: Subject: [PATCH 006/009]: raid1: chunk size check in run

raz ben yehuda <raziebe@xxxxxxxxx> · Thu, 21 May 2009 16:32:12 +0300



On Thu, 2009-05-21 at 13:11 +1000, Neil Brown wrote:
> On Wednesday May 20, raziebe@xxxxxxxxx wrote:
> > Neil
> > First I thank you for your effort. Now I can work in full steam on the
> > reshape on top of the new raid0 code. Currently this is what I have in
> > mind.If you have any design suggestions I would be happy to hear before
> > the coding.
> > 
> >    I added : raid0_add_hot that:
> > 	1. checks if the new disk size if smaller than the raid chunk size. if
> > so , reject.
> > 	2. check if new the disk size max_hw_sectors is smaller than the
> > raid's. if so generate a warning but do not reject.   
> >  	3. adds a disk to raid0 disk list. and turns off its in_sync bit.
> 
> I don't think the 'in_sync' bit is used in raid0 currently, so that
> bit seems irrelevant, but shouldn't hurt.
> > 
> > I will add raid0_check_reshape 
> >       This procedure prepares the raid for the reshape process.
> > 	1. Creates a temporary mddev with the same disks as the raid's and with
> > the new disks. This raid acts as a mere mappings so i will be able to
> > map sectors to the new target raid in the reshape process. This means i
> > have to work in create_strip_zones raid0_run ( separate patch ).
> >         2. Sets the target raid transfer size.
> > 	3. Create an allocation scheme for reshape bio allocation. i reshape in
> > chunk size. 
> > 	4. create raid0_reshape thread for writes.
> > 	5. wake up raid0_sync thread. 
> 
> Do you really need a temporary mddev, or just a temporary 'conf'??
I need mddev because i want to use map_sector and create_strip. I
certainly can fix map_sector and create_strip to work with conf and not
mddev, though it will make create_strip quite cumbersome. 
I will split create_strip to several independent functions. 
Do you agree ? 
> Having to create the raid0_reshape thread just for writes is a bit
> unfortunate, but it probably is the easiest approach.  You might be
> able to get the raid0_sync thread to do them, but that would be messy
> I expect.
I will start with the easy approach, meaning , a different thread for the writes.
Once i am done , i will see how can merge the reads and writes to work
in md_sync.
> > 
> > I will add raid0_sync: raid0_sync acts as the reshape read size process.
> > 
> >     1. Allocates a read bio.	
> >     2. Map_bio target with find_zone and map_sector, both map_sector and
> > find_zone are using the old raid mappings.
> >     3. Deactivate the raid.
> >     3. Lock and wait for the raid to be emptied from any previous IOs.
> >     4. Generate a read request.
> >     5. Release the lock. 
> 
> I think that sounds correct.
> 
> > 
> > I will add reshape_read_endio: 
> > 	if IO is successful then:
> > 		add the bio to reshape_list
> > 	else
> > 		add the bio to a retry list ( how many retries .. ?)
> 
> zero retries.  The underlying block device has done all the retries
> that are appropriate.  If you get a read error, then that block is
> gone.  Probably the best you can do is write garbage to the
> destination and report the error.
> 
> > 
> > I will add raid0_reshape: 
> > 	raid0_reshape is a md_thread that polls on the reshape_list and
> > commences writes based on the reads.
> > 	1. Grub a bio from reshape list.
> > 	2. map sector and find zone on the new raid mappings. 
> > 	3. set bio direction to write.
> > 	4. generate a write.
> > 	
> > 	if bio is in retry_list retry the bio.
> > 	if bio is in active_io list do the bio.
> > 	
> > I will add a reshape_write_endio that just frees the bio and his pages.
> 
> OK (except for the retry).
> 
> > 
> > raid0_make_request
> > 	I will add a check and see if the raid is in reshape. 
> > 	if so then
> > 		if IO is in the new mappings area we generate the IO
> > 				from the new mappings.
> > 		if IO is in the old mappings then we generate the IO
> > 				from the old mappings ( race here .. no ?)
> > 		if IO is in the current reshape active area, we push the io to a
> > active_io list that will processed by raid0_reshape.
> 
> This doesn't seem to match what you say above.
> If you don't submit a read for 'reshape' until all IO has drained,  
> then presumably you would just block any incoming IO until the current
> reshape requests have all finished.  i.e. you only ever have IO or
> reshape, but not both.
Where did i said that ? guess i wasn't clear.
> Alternately  you could have a sliding window covering there area
> that is currently being reshaped.
> If an IO comes in for that area, you need to either
>   - close the window and perform the IO, or
>   - wait for the window to slide past.
> I would favour the latter.  But queueing the IO for raid0_reshape doesn't
> really gain you anything I think.
I wasn't clear enough.A "current reshape active area" is my sliding window. 
I wait for the window to slide past. this is exactly what i had in mind.so yes
this is what am writing, a reshape_window.
> Issues that you haven't mentioned:
>   - metadata update: you need to record progress in the metadata
>     as the window slides along, in case of an unclean restart
I thought md does that for me. So it doesn't. Am i to call
md_allow_write ( that calls md_update_sbs ) ? how frequent ?
>   - Unless you only schedule one chunk at a time (which would be slow
>     things down I expect), you need to ensure that you don't schedule
>     a write to block for which the read hasn't completed yet.
ah... yes. i thought of it but forgot. my question is how ? should i
simply use an interruptable sleep ?  what do you do in raid5 ?
>     This is particularly an issues if you support changing the 
>     chunk size.
>   - I assume you are (currently) only supporting a reshape that
>     increases the size of the array and the number of devices?
neither changing the chunk size nor shrink is in my spec. so no. 
Maybe when i finish my studies ( you can google for "offsched + raz" and follow the link ... )
then i will have some raid quality time. 
> NeilBrown

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html