[question] solution for raid10 configuration concurrent with io

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

rai10 support that add or remove conf->mirrors[].rdev/replacement can
concurrent with io, and this is handled by something like following
code:

raid10_write_request
 if (rrdev)
  // decide to write replacement
  r10_bio->devs[i].repl_bio = bio

 raid10_write_one_disk
        if (replacement) {
                rdev = conf->mirrors[devnum].replacement;
                if (rdev == NULL) {
                        /* Replacement just got moved to main 'rdev' */
                        smp_mb();
                        rdev = conf->mirrors[devnum].rdev;
                }
        } else
                rdev = conf->mirrors[devnum].rdev;

And this scheme is not complete and can cause kernel panic or data loss
in multiple places.

for example, null-ptr-dereference:

t1:				t2:
raid10_write_request:

 // read rdev
 rdev = conf->mirros[].rdev;
				raid10_remove_disk
				 p = conf->mirros + number;
				 rdevp = &p->rdev;
				 // reset rdev
				 *rdevp = NULL
 raid10_write_one_disk
  // reread rdev got NULL
  rdev = conf->mirrors[devnum].rdev
    // null-ptr-dereference
   mbio = bio_alloc_clone(rdev->bdev...)
				 synchronize_rcu()

for example, data loss:

t1:
// assum that rdev is NULL, and replacement is not NULL
raid10_write_request
 // read replacement
 rrdev = conf->mirros[].replacement

				t2: replacement moved to rdev
				raid10_remove_disk
				 p->rdev = p->replacement
				 p->replacement = NULL

				t3: add a new replacement
				raid10_add_disk
				 p->replacement = rdev
 raid10_write_one_disk
  // read again, and got wrong replacement
  // should write to rdev in this case
  rrdev = conf->mirros[].replacement

In order to fix these problems, I come up with two different solutions:

1) based on current solution, make sure rdev is only accessd once while
handling io, this can be done by changing r10bio:
diff --git a/drivers/md/raid10.h b/drivers/md/raid10.h
index 63e48b11b552..c132cba1953c 100644
--- a/drivers/md/raid10.h
+++ b/drivers/md/raid10.h
@@ -145,12 +145,12 @@ struct r10bio {
         */
        struct r10dev {
                struct bio      *bio;
-               union {
-                       struct bio      *repl_bio; /* used for resync and
-                                                   * writes */
-                       struct md_rdev  *rdev;     /* used for reads
-                                                   * (read_slot >= 0) */
-               };
+               struct md_rdev  *rdev;
+
+               /* used for resync and writes */
+               struct bio      *repl_bio;
+               struct md_rdev  *replacement;
+
                sector_t        addr;
                int             devnum;
        } devs[];

And only read conf once, then record rdev/replacement. This requires to
change all the context to issue io and complete io.

2) add a new synchronization that configuration can't concurrent with
any io, however, I think this can work but I'm not 100% percent sure.
Code changes should be less, and this will allow some cleanups and
simplify logic.

Any suggestions?

Thanks,
Kuai



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux