Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors

"Jens Stutte (Archiv)" <jens@xxxxxxxxxxxxxxxxxx> · Fri, 24 Sep 2021 17:34:11 +0200

Hi all,

I just had the occasion to test the new patch as landed in arch linux 
5.14.7. Unfortunately it does not work for me. Attached you can find a 
modification that works for me, though I am not really sure why 
write_behind seems not to be set to true on my configuration. If there 
is any more data I can provide to help you to investigate, please let me 
know.

Thanks for any clues,

Jens

My configuration:

[root@vdr jens]# mdadm --detail -v /dev/md0
/dev/md0:
           Version : 1.2
     Creation Time : Fri Dec 26 09:50:53 2014
        Raid Level : raid1
        Array Size : 1953381440 (1862.89 GiB 2000.26 GB)
     Used Dev Size : 1953381440 (1862.89 GiB 2000.26 GB)
      Raid Devices : 2
     Total Devices : 2
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Fri Sep 24 17:30:51 2021
             State : active
    Active Devices : 2
   Working Devices : 2
    Failed Devices : 0
     Spare Devices : 0

Consistency Policy : bitmap

              Name : vdr:0  (local to host vdr)
              UUID : 5532ffda:ccbc790f:b50c4959:8f0fd43f
            Events : 32805

    Number   Major   Minor   RaidDevice State
       2       8       33        0      active sync   /dev/sdc1
       3       8       17        1      active sync   /dev/sdb1

[root@vdr jens]# mdadm -X /dev/sdb1
        Filename : /dev/sdb1
           Magic : 6d746962
         Version : 4
            UUID : 5532ffda:ccbc790f:b50c4959:8f0fd43f
          Events : 32804
  Events Cleared : 32804
           State : OK
       Chunksize : 64 MB
          Daemon : 5s flush period
      Write Mode : Allow write behind, max 4096
       Sync Size : 1953381440 (1862.89 GiB 2000.26 GB)
          Bitmap : 29807 bits (chunks), 3 dirty (0.0%)

[root@vdr jens]# mdadm -X /dev/sdc1
        Filename : /dev/sdc1
           Magic : 6d746962
         Version : 4
            UUID : 5532ffda:ccbc790f:b50c4959:8f0fd43f
          Events : 32804
  Events Cleared : 32804
           State : OK
       Chunksize : 64 MB
          Daemon : 5s flush period
      Write Mode : Allow write behind, max 4096
       Sync Size : 1953381440 (1862.89 GiB 2000.26 GB)
          Bitmap : 29807 bits (chunks), 3 dirty (0.0%)

Am 17.08.21 um 14:32 schrieb Ming Lei:
On Tue, Aug 17, 2021 at 06:06:12AM +0100, Christoph Hellwig wrote:
On Mon, Aug 16, 2021 at 07:40:48PM +0800, Ming Lei wrote:
0 ~ 254: each bvec's length is 512
255: bvec's length is 8192

the total length is just 512*255 + 8192 = 138752 bytes = 271 sectors, but it
still may need 257 bvecs, which can't be allocated via bio_alloc_bioset().
Yes, we still need the rounding magic that alloc_behind_master_bio uses
here.
But it is wrong to use max sectors to limit number of bvecs(segments), isn't it?
The raid1 write behind code cares about the size ofa bio it can reach by
adding order 0 pages to it.  The bvecs are part of that and I think the
calculation in the patch documents that a well.
Thinking of further, your and Guoqing's patch are correct & enough since
bio_copy_data() just copies bytes(sectors) stream from fs bio to the
write behind bio.


Thanks,
Ming

diff --unified --recursive --text archlinux-linux/drivers/md/raid1.c archlinux-linux-diff/drivers/md/raid1.c

--- archlinux-linux/drivers/md/raid1.c	2021-09-24 14:37:15.347771866 +0200
+++ archlinux-linux-diff/drivers/md/raid1.c	2021-09-24 14:40:02.443978319 +0200
@@ -1501,7 +1501,7 @@
 			 * Not if there are too many, or cannot
 			 * allocate memory, or a reader on WriteMostly
 			 * is waiting for behind writes to flush */
-			if (bitmap &&
+			if (bitmap && write_behind &&
 			    (atomic_read(&bitmap->behind_writes)
 			     < mddev->bitmap_info.max_write_behind) &&
 			    !waitqueue_active(&bitmap->behind_wait)) {
diff --unified --recursive --text archlinux-linux/drivers/md/raid1.c archlinux-linux-changed/drivers/md/raid1.c
--- archlinux-linux/drivers/md/raid1.c	2021-09-24 15:43:22.842680119 +0200
+++ archlinux-linux-changed/drivers/md/raid1.c	2021-09-24 15:43:59.426142955 +0200
@@ -1329,7 +1329,6 @@
 	struct raid1_plug_cb *plug = NULL;
 	int first_clone;
 	int max_sectors;
-	bool write_behind = false;
 
 	if (mddev_is_clustered(mddev) &&
 	     md_cluster_ops->area_resyncing(mddev, WRITE,
@@ -1383,14 +1382,6 @@
 	for (i = 0;  i < disks; i++) {
 		struct md_rdev *rdev = rcu_dereference(conf->mirrors[i].rdev);
 
-		/*
-		 * The write-behind io is only attempted on drives marked as
-		 * write-mostly, which means we could allocate write behind
-		 * bio later.
-		 */
-		if (rdev && test_bit(WriteMostly, &rdev->flags))
-			write_behind = true;
-
 		if (rdev && unlikely(test_bit(Blocked, &rdev->flags))) {
 			atomic_inc(&rdev->nr_pending);
 			blocked_rdev = rdev;
@@ -1470,7 +1461,7 @@
 	 * at a time and thus needs a new bio that can fit the whole payload
 	 * this bio in page sized chunks.
 	 */
-	if (write_behind && bitmap)
+	if (bitmap)
 		max_sectors = min_t(int, max_sectors,
 				    BIO_MAX_VECS * (PAGE_SIZE >> 9));
 	if (max_sectors < bio_sectors(bio)) {
@@ -1501,7 +1492,7 @@
 			 * Not if there are too many, or cannot
 			 * allocate memory, or a reader on WriteMostly
 			 * is waiting for behind writes to flush */
-			if (bitmap &&
+			if (bitmap && 
 			    (atomic_read(&bitmap->behind_writes)
 			     < mddev->bitmap_info.max_write_behind) &&
 			    !waitqueue_active(&bitmap->behind_wait)) {