Re: [question] solution for raid10 configuration concurrent with io

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 14, 2023 at 10:30 AM Yu Kuai <yukuai3@xxxxxxxxxx> wrote:
>
> Hello,
>
> rai10 support that add or remove conf->mirrors[].rdev/replacement can
> concurrent with io, and this is handled by something like following
> code:
>
> raid10_write_request
>   if (rrdev)
>    // decide to write replacement
>    r10_bio->devs[i].repl_bio = bio
>
>   raid10_write_one_disk
>          if (replacement) {
>                  rdev = conf->mirrors[devnum].replacement;
>                  if (rdev == NULL) {
>                          /* Replacement just got moved to main 'rdev' */
>                          smp_mb();
>                          rdev = conf->mirrors[devnum].rdev;
>                  }
>          } else
>                  rdev = conf->mirrors[devnum].rdev;
>
> And this scheme is not complete and can cause kernel panic or data loss
> in multiple places.
>
> for example, null-ptr-dereference:
>
> t1:                             t2:
> raid10_write_request:
>
>   // read rdev
>   rdev = conf->mirros[].rdev;
>                                 raid10_remove_disk
>                                  p = conf->mirros + number;
>                                  rdevp = &p->rdev;
>                                  // reset rdev
>                                  *rdevp = NULL
>   raid10_write_one_disk
>    // reread rdev got NULL
>    rdev = conf->mirrors[devnum].rdev
>      // null-ptr-dereference
>     mbio = bio_alloc_clone(rdev->bdev...)
>                                  synchronize_rcu()

Hi Yu kuai

raid10_write_request adds the rdev->nr_pending with rcu lock
protection. Can this case happen? After adding ->nr_pending, the rdev
can't be removed.

>
> for example, data loss:
>
> t1:
> // assum that rdev is NULL, and replacement is not NULL

How can trigger this? Could you give the detailed commands?

Best Regards
Xiao Ni

> raid10_write_request
>   // read replacement
>   rrdev = conf->mirros[].replacement
>
>                                 t2: replacement moved to rdev
>                                 raid10_remove_disk
>                                  p->rdev = p->replacement
>                                  p->replacement = NULL
>
>                                 t3: add a new replacement
>                                 raid10_add_disk
>                                  p->replacement = rdev
>   raid10_write_one_disk
>    // read again, and got wrong replacement
>    // should write to rdev in this case
>    rrdev = conf->mirros[].replacement
>
> In order to fix these problems, I come up with two different solutions:
>
> 1) based on current solution, make sure rdev is only accessd once while
> handling io, this can be done by changing r10bio:
> diff --git a/drivers/md/raid10.h b/drivers/md/raid10.h
> index 63e48b11b552..c132cba1953c 100644
> --- a/drivers/md/raid10.h
> +++ b/drivers/md/raid10.h
> @@ -145,12 +145,12 @@ struct r10bio {
>           */
>          struct r10dev {
>                  struct bio      *bio;
> -               union {
> -                       struct bio      *repl_bio; /* used for resync and
> -                                                   * writes */
> -                       struct md_rdev  *rdev;     /* used for reads
> -                                                   * (read_slot >= 0) */
> -               };
> +               struct md_rdev  *rdev;
> +
> +               /* used for resync and writes */
> +               struct bio      *repl_bio;
> +               struct md_rdev  *replacement;
> +
>                  sector_t        addr;
>                  int             devnum;
>          } devs[];
>
> And only read conf once, then record rdev/replacement. This requires to
> change all the context to issue io and complete io.
>
> 2) add a new synchronization that configuration can't concurrent with
> any io, however, I think this can work but I'm not 100% percent sure.
> Code changes should be less, and this will allow some cleanups and
> simplify logic.
>
> Any suggestions?
>
> Thanks,
> Kuai
>





[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux