On Fri, Dec 15, 2023 at 6:24 PM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote: [...] > >> static inline void md_sync_acct_bio(struct bio *bio, unsigned long nr_sectors) > >> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h > >> index 3f8a21cd9233..d28b98adf457 100644 > >> --- a/include/linux/blkdev.h > >> +++ b/include/linux/blkdev.h > >> @@ -170,7 +170,7 @@ struct gendisk { > >> struct list_head slave_bdevs; > >> #endif > >> struct timer_rand_state *random; > >> - atomic_t sync_io; /* RAID */ > >> + atomic64_t sync_io; /* RAID */ > >> struct disk_events *ev; > > > > As we are on this, I wonder whether we really need this. > > AFAICT, is_mddev_idle() is the only consumer of sync_io. > > We can probably do the same check in is_mddev_idle() > > without sync_io. > > After reviewing some code, what I'm understand is following: > > I think 'sync_io' is used to distinguish 'sync io' from raid(this can > from different raid, for example, different partition is used for > different array) and other io(any io, even it's not from raid). And > if there are any other IO, sync speed is limited to min, otherwise it's > limited to max. > > If we want to keep this behaviour, I can't think of any other way for > now... Thanks for looking into this. To keep current behavior, we will need it in gendisk. [...] > >> @@ -8496,14 +8496,15 @@ static int is_mddev_idle(struct mddev *mddev, int init) > >> { > >> struct md_rdev *rdev; > >> int idle; > >> - int curr_events; > >> + long long curr_events; > >> > >> idle = 1; > >> rcu_read_lock(); > >> rdev_for_each_rcu(rdev, mddev) { > >> struct gendisk *disk = rdev->bdev->bd_disk; > >> - curr_events = (int)part_stat_read_accum(disk->part0, sectors) - > >> - atomic_read(&disk->sync_io); > >> + curr_events = > >> + (long long)part_stat_read_accum(disk->part0, sectors) - > >> + atomic64_read(&disk->sync_io); > > By the way, I don't think this overflow is problematic, assume that > sectors accumulate by 'A' and sync_io accumulate by 'B': > > (setros + A) - (sync_io + B) -(sectors - sync_io) = (A - B) > > Nomatter overflow or truncation happened of not in the abouve addition > and subtraction, the result is correct. > > And even if io accounting is disabled, which means sectors and A is > always 0, the result will always be -B that is <= 0, hence > is_mddev_idle() will always return true, and sync_speed will be limited > to max in this case. We only use this for idle or not check, the behavior is OK (I think). However, this logic is error prone. On 64-bit systems, there is a 4-byte hole behind sync_io. I think we can just use it for atomic64_t so that we don't have to worry about overflow. Thanks, Song