On Tue, 31 Jul 2012 16:12:04 +0800 Shaohua Li <shli@xxxxxxxxxx> wrote: > 2012/7/31 NeilBrown <neilb@xxxxxxx>: > > On Thu, 26 Jul 2012 16:01:50 +0800 Shaohua Li <shli@xxxxxxxxxx> wrote: > > > >> Write has some impacts to SSD: > >> 1. wear out flash. Frequent write can speed out the progress. > >> 2. increase the burden of garbage collection of SSD firmware. If no space > >> left for write, SSD firmware garbage collection will try to free some space. > >> 3. slow down subsequent write. After write SSD to some extents (for example, > >> write the whole disk), subsequent write will slow down significantly (because > >> almost every write invokes garbage collection in such case). > >> > >> We want to avoid unnecessary write as more as possible. raid sync generally > >> involves a lot of unnecessary write. For example, even two disks don't have > >> any data, we write the second disk for the whole disk size. > >> > >> To reduce write, we always compare raid disk data and only write mismatch part. > >> This means sync will have extra IO read and memory compare. So this scheme is > >> very bad for hard disk raid and sometimes SSD raid too if mismatch part is > >> majority. But sometimes this can be very helpful to reduce write, in that case, > >> since sync is rare operation, the extra IO/CPU usage is worthy paying. People > >> who want to use the feature should understand the risk first. So this ability > >> is off by default, a sysfs entry can be used to enable it. > >> > >> Signed-off-by: Shaohua Li <shli@xxxxxxxxxxxx> > >> --- > >> drivers/md/md.c | 41 +++++++++++++++++++++++++++++++ > >> drivers/md/md.h | 3 ++ > >> drivers/md/raid1.c | 70 +++++++++++++++++++++++++++++++++++++++++++++++++---- > >> 3 files changed, 110 insertions(+), 4 deletions(-) > >> > >> Index: linux/drivers/md/md.h > >> =================================================================== > >> --- linux.orig/drivers/md/md.h 2012-07-25 13:51:00.353775521 +0800 > >> +++ linux/drivers/md/md.h 2012-07-26 10:36:38.500740552 +0800 > >> @@ -325,6 +325,9 @@ struct mddev { > >> #define MD_RECOVERY_FROZEN 9 > >> > >> unsigned long recovery; > >> +#define MD_RECOVERY_MODE_REPAIR 0 > >> +#define MD_RECOVERY_MODE_DISCARD 1 > >> + unsigned long recovery_mode; > > > > You have not documented the meaning of these two flags at all, and I don't > > feel up to guessing. > > > > The patch looks more complex that it should be. The behaviour you are > > suggesting is exactly the behaviour you get when MD_RECOVERY_REQUESTED is > > set, so at most I expect to see a few places where that flag is tested > > changed to test something else as well. > > > > How would this be used? It affects resync and resync happens as soon as the > > array is assembled. So when and how would you set the flag which says > > "prefer reads to writes"? It seems like it needs to be set in the metadata. > > > > BTW RAID10 already does this - it reads and compares for a normal sync. So > > maybe just tell people to use raid10 if they want this behaviour? > > It's true, the behavior likes MD_RECOVERY_REQUESTED, ie, read all disks, > only do write if two disk data not match. But I hope it works as soon as the > array is assembled. Surely we can set in the metadata, but I didn't want to > enable it by default, since even with SSD, the read-compare-write isn't optimal > some times, because it involves extra IO read and memory compare. > > It appears MD_RECOVERY_REQUESTED assumes disks are insync. > It doesn't read disk !insync, so it doesn't work for assemble. > > In my mind, user frozen the sync first (with sync_action setting), and then > enable read-compare-write, and finally continue the sync. I can't stop > a recovery and set MD_RECOVERY_REQUESTED, so I added a flag. > Anyway, please suggest what the preferred way for this. I guess I just don't find this functionality at all convincing. It isn't clear when anyone would use it, or how they would use it. It seems best not to provide it. > > The second patch is for a special case. two disks data don't match, but > disk1 data are all 0. I'd like to do trim for disk2, instead of write > 0 to disk2, > this can be helpful for SSD. If this is a useful optimisation, then I expect it should be implemented somewhat lower in the stack, so that every attempt to write a zero block becomes a 'trim'. Ideally the SSD itself should handle this optimisation. I really don't think it is appropriate for md to be detecting zeros and writing them as 'trim'. NeilBrown > > Good to know raid10 already does read-compare-write. From the comments, > looks it only supports resync not recovery. > > Thanks, > Shaohua
Attachment:
signature.asc
Description: PGP signature