On 2016/11/24 下午3:34, Guoqing Jiang wrote: > Hi Coly, > > Please see below comments, just FYI. > > On 11/22/2016 05:54 AM, Coly Li wrote: >> - In raid1_make_request(), wait_barrier() is replaced by, >> a) wait_read_barrier(): wait barrier in regular read I/O code path >> b) wait_barrier(): wait barrier in regular write I/O code path >> The differnece is wait_read_barrier() only waits if array is >> frozen, I >> am not able to combile them into one function, because they must >> receive >> differnet data types in their arguments list. > > Maybe it is possible to add a parameter to distinguish read and write, then > the two functions can be unified. Hi Guoqing, read_balance() will make sure a regular read won't access a non-synced disk, therefore regular read only needs to wait when the whole raid1 is frozen. That's why wait_read_barrier() only waits for !conf->array_frozen, but _wait_barrier() will waits for both !conf->array_frozen and !conf->barrier[idx]. Combine them is possible, but the code won't be simpler, because, - inside the function I still need to wait two wait_event_lock_irq() for different wait condition, and if I use sector_t as the function parameter, in wait_all_barrier() I have to send bogus sector number (calculated by bucket index) into _wait_barrier(). that's why I choose current implementation. > >> - align_to_barrier_unit_end() is called to make sure both regular and >> resync I/O won't go across the border of a barrier unit size. >> Open question: >> - Need review from md clustring developer, I don't touch related >> code now. > > I don't find problems with some tests so far. In raid1_sync_request(), I see, conf->cluster_sync_low = mddev->curr_resync_completed; conf->cluster_sync_high = conf->cluster_sync_low + CLUSTER_RESYNC_WINDOW_SECTORS; Is it possible that LBA range [conf->cluster_sync_low, conf->cluster_sync_high] goes across the border of a barrier unit size ? > >> static void reschedule_retry(struct r1bio *r1_bio) >> @@ -215,10 +214,15 @@ static void reschedule_retry(struct r1bi >> unsigned long flags; >> struct mddev *mddev = r1_bio->mddev; >> struct r1conf *conf = mddev->private; >> + sector_t sector_nr; >> + long idx; >> + >> + sector_nr = r1_bio->sector; >> + idx = get_barrier_bucket_idx(sector_nr); > > Isn't "idx = get_barrier_bucket_idx(r1_bio->sector)" enough here? Nice catch! I will modify this in next version. > >> @@ -255,19 +257,14 @@ static void call_bio_endio(struct r1bio >> if (!test_bit(R1BIO_Uptodate, &r1_bio->state)) >> bio->bi_error = -EIO; >> - if (done) { >> + if (done) >> bio_endio(bio); >> - /* >> - * Wake up any possible resync thread that waits for the device >> - * to go idle. >> - */ >> - allow_barrier(conf, start_next_window, bi_sector); >> - } >> } >> static void raid_end_bio_io(struct r1bio *r1_bio) >> { >> struct bio *bio = r1_bio->master_bio; >> + struct r1conf *conf = r1_bio->mddev->private; >> /* if nobody has done the final endio yet, do it now */ >> if (!test_and_set_bit(R1BIO_Returned, &r1_bio->state)) { >> @@ -278,6 +275,12 @@ static void raid_end_bio_io(struct r1bio >> call_bio_endio(r1_bio); >> } >> + >> + /* >> + * Wake up any possible resync thread that waits for the device >> + * to go idle. >> + */ >> + allow_barrier(conf, r1_bio->sector); >> free_r1bio(r1_bio); >> } > > I am not sure it is safe for above changes since call_bio_endio is not only > called within raid_end_bio_io. > This is safe. I move it here because location of wait_barrier() is changed and replaced by wait_read_barrier() for READ I/O, and wait_barrier() for WRITE I/O. In this patch, conf->nr_pending[idx] is raised for each ri_bio, so allow_barrier() should be called for each r1_bio destroy to decrease corresponding conf->nr_pending[idx]. >> @@ -311,6 +314,7 @@ static int find_bio_disk(struct r1bio *r >> return mirror; >> } >> +/* bi_end_io callback for regular READ bio */ > > Not related to the patch itself, it would be better to make the similar > changes in other patches. > OK, I will move it into another separated patch. >> -static bool need_to_wait_for_sync(struct r1conf *conf, struct bio *bio) >> +/* A regular I/O should wait when, >> + * - The whole array is frozen (both READ and WRITE) >> + * - bio is WRITE and in same barrier bucket conf->barrier[idx] raised >> + */ >> +static void _wait_barrier(struct r1conf *conf, long idx) >> { >> - bool wait = false; >> - >> - if (conf->array_frozen || !bio) >> - wait = true; >> - else if (conf->barrier && bio_data_dir(bio) == WRITE) { >> - if ((conf->mddev->curr_resync_completed >> - >= bio_end_sector(bio)) || >> - (conf->next_resync + NEXT_NORMALIO_DISTANCE >> - <= bio->bi_iter.bi_sector)) >> - wait = false; >> - else >> - wait = true; >> + spin_lock_irq(&conf->resync_lock); >> + if (conf->array_frozen || conf->barrier[idx]) { >> + conf->nr_waiting[idx]++; >> + /* Wait for the barrier to drop. */ >> + wait_event_lock_irq( >> + conf->wait_barrier, >> + !conf->array_frozen && !conf->barrier[idx], >> + conf->resync_lock); >> + conf->nr_waiting[idx]--; >> } >> - return wait; >> + conf->nr_pending[idx]++; >> + spin_unlock_irq(&conf->resync_lock); >> } >> -static sector_t wait_barrier(struct r1conf *conf, struct bio *bio) >> +static void wait_read_barrier(struct r1conf *conf, sector_t sector_nr) >> { >> - sector_t sector = 0; >> + long idx = get_barrier_bucket_idx(sector_nr); >> spin_lock_irq(&conf->resync_lock); >> - if (need_to_wait_for_sync(conf, bio)) { >> - conf->nr_waiting++; >> - /* Wait for the barrier to drop. >> - * However if there are already pending >> - * requests (preventing the barrier from >> - * rising completely), and the >> - * per-process bio queue isn't empty, >> - * then don't wait, as we need to empty >> - * that queue to allow conf->start_next_window >> - * to increase. >> - */ >> - wait_event_lock_irq(conf->wait_barrier, >> - !conf->array_frozen && >> - (!conf->barrier || >> - ((conf->start_next_window < >> - conf->next_resync + RESYNC_SECTORS) && >> - current->bio_list && >> - !bio_list_empty(current->bio_list))), >> - conf->resync_lock); >> - conf->nr_waiting--; >> - } >> - >> - if (bio && bio_data_dir(bio) == WRITE) { >> - if (bio->bi_iter.bi_sector >= conf->next_resync) { >> - if (conf->start_next_window == MaxSector) >> - conf->start_next_window = >> - conf->next_resync + >> - NEXT_NORMALIO_DISTANCE; >> - >> - if ((conf->start_next_window + NEXT_NORMALIO_DISTANCE) >> - <= bio->bi_iter.bi_sector) >> - conf->next_window_requests++; >> - else >> - conf->current_window_requests++; >> - sector = conf->start_next_window; >> - } >> + if (conf->array_frozen) { >> + conf->nr_waiting[idx]++; >> + /* Wait for array to unfreeze */ >> + wait_event_lock_irq( >> + conf->wait_barrier, >> + !conf->array_frozen, >> + conf->resync_lock); >> + conf->nr_waiting[idx]--; >> } >> - >> - conf->nr_pending++; >> + conf->nr_pending[idx]++; >> spin_unlock_irq(&conf->resync_lock); >> - return sector; >> } >> -static void allow_barrier(struct r1conf *conf, sector_t >> start_next_window, >> - sector_t bi_sector) >> +static void wait_barrier(struct r1conf *conf, sector_t sector_nr) >> +{ >> + long idx = get_barrier_bucket_idx(sector_nr); >> + >> + _wait_barrier(conf, idx); >> +} >> + > > I personally prefer to use one wait_barrier to cover both read and > write, something like: > > wait_barrier(struct r1conf *conf, long idx, int direction) > As I explain previous, combine them together won't make the code simpler, maybe more complexed. Let me try to see, if there is any better solution to make current patch more simpler. > BTW: there are some unnecessary changes inside the patch, maybe you can > remove it > or introduce other patches for them. Yes, great suggestion, I will do this in next version. Thanks for your review ! Coly -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html