Re: [PATCH v2 1/2] md: Fix overflow in is_mddev_idle

Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> · Mon, 18 Dec 2023 09:39:15 +0800

Hi,

在 2023/12/17 8:28, Song Liu 写道:
On Fri, Dec 15, 2023 at 6:24 PM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:
[...]
   static inline void md_sync_acct_bio(struct bio *bio, unsigned long nr_sectors)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 3f8a21cd9233..d28b98adf457 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -170,7 +170,7 @@ struct gendisk {
          struct list_head slave_bdevs;
   #endif
          struct timer_rand_state *random;
-       atomic_t sync_io;               /* RAID */
+       atomic64_t sync_io;             /* RAID */
          struct disk_events *ev;

As we are on this, I wonder whether we really need this.
AFAICT, is_mddev_idle() is the only consumer of sync_io.
We can probably do the same check in is_mddev_idle()
without sync_io.

After reviewing some code, what I'm understand is following:

I think 'sync_io' is used to distinguish 'sync io' from raid(this can
from different raid, for example, different partition is used for
different array) and other io(any io, even it's not from raid). And
if there are any other IO, sync speed is limited to min, otherwise it's
limited to max.

If we want to keep this behaviour, I can't think of any other way for
now...

Thanks for looking into this. To keep current behavior, we will need it
in gendisk.

[...]

@@ -8496,14 +8496,15 @@ static int is_mddev_idle(struct mddev *mddev, int init)
   {
          struct md_rdev *rdev;
          int idle;
-       int curr_events;
+       long long curr_events;

          idle = 1;
          rcu_read_lock();
          rdev_for_each_rcu(rdev, mddev) {
                  struct gendisk *disk = rdev->bdev->bd_disk;
-               curr_events = (int)part_stat_read_accum(disk->part0, sectors) -
-                             atomic_read(&disk->sync_io);
+               curr_events =
+                       (long long)part_stat_read_accum(disk->part0, sectors) -
+                             atomic64_read(&disk->sync_io);

By the way, I don't think this overflow is problematic, assume that
sectors accumulate by 'A' and sync_io accumulate by 'B':

(setros + A) - (sync_io + B) -(sectors - sync_io) = (A - B)

Nomatter overflow or truncation happened of not in the abouve addition
and subtraction, the result is correct.

And even if io accounting is disabled, which means sectors and A is
always 0, the result will always be -B that is <= 0, hence
is_mddev_idle() will always return true, and sync_speed will be limited
to max in this case.

We only use  this for idle or not check, the behavior is OK (I think).
However, this logic is error prone.

On 64-bit systems, there is a 4-byte hole behind sync_io. I think we can
just use it for atomic64_t so that we don't have to worry about overflow.

I'm not sure about this, because other than this ubsan warning, this
overflow doesn't have any impact on functionality to me.

If we care about this 'hole', there are lots of holes in gendisk, and
can be avoiled, for example, moving 'sync_io' near  to 'node_id'.

Thanks,
Kuai


Thanks,
Song
.