Hi all,
I just had the occasion to test the new patch as landed in arch linux
5.14.7. Unfortunately it does not work for me. Attached you can find a
modification that works for me, though I am not really sure why
write_behind seems not to be set to true on my configuration. If there
is any more data I can provide to help you to investigate, please let me
know.
Thanks for any clues,
Jens
My configuration:
[root@vdr jens]# mdadm --detail -v /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Fri Dec 26 09:50:53 2014
Raid Level : raid1
Array Size : 1953381440 (1862.89 GiB 2000.26 GB)
Used Dev Size : 1953381440 (1862.89 GiB 2000.26 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Fri Sep 24 17:30:51 2021
State : active
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Consistency Policy : bitmap
Name : vdr:0 (local to host vdr)
UUID : 5532ffda:ccbc790f:b50c4959:8f0fd43f
Events : 32805
Number Major Minor RaidDevice State
2 8 33 0 active sync /dev/sdc1
3 8 17 1 active sync /dev/sdb1
[root@vdr jens]# mdadm -X /dev/sdb1
Filename : /dev/sdb1
Magic : 6d746962
Version : 4
UUID : 5532ffda:ccbc790f:b50c4959:8f0fd43f
Events : 32804
Events Cleared : 32804
State : OK
Chunksize : 64 MB
Daemon : 5s flush period
Write Mode : Allow write behind, max 4096
Sync Size : 1953381440 (1862.89 GiB 2000.26 GB)
Bitmap : 29807 bits (chunks), 3 dirty (0.0%)
[root@vdr jens]# mdadm -X /dev/sdc1
Filename : /dev/sdc1
Magic : 6d746962
Version : 4
UUID : 5532ffda:ccbc790f:b50c4959:8f0fd43f
Events : 32804
Events Cleared : 32804
State : OK
Chunksize : 64 MB
Daemon : 5s flush period
Write Mode : Allow write behind, max 4096
Sync Size : 1953381440 (1862.89 GiB 2000.26 GB)
Bitmap : 29807 bits (chunks), 3 dirty (0.0%)
Am 17.08.21 um 14:32 schrieb Ming Lei:
On Tue, Aug 17, 2021 at 06:06:12AM +0100, Christoph Hellwig wrote:
On Mon, Aug 16, 2021 at 07:40:48PM +0800, Ming Lei wrote:
0 ~ 254: each bvec's length is 512
255: bvec's length is 8192
the total length is just 512*255 + 8192 = 138752 bytes = 271 sectors, but it
still may need 257 bvecs, which can't be allocated via bio_alloc_bioset().
Yes, we still need the rounding magic that alloc_behind_master_bio uses
here.
But it is wrong to use max sectors to limit number of bvecs(segments), isn't it?
The raid1 write behind code cares about the size ofa bio it can reach by
adding order 0 pages to it. The bvecs are part of that and I think the
calculation in the patch documents that a well.
Thinking of further, your and Guoqing's patch are correct & enough since
bio_copy_data() just copies bytes(sectors) stream from fs bio to the
write behind bio.
Thanks,
Ming
diff --unified --recursive --text archlinux-linux/drivers/md/raid1.c archlinux-linux-diff/drivers/md/raid1.c
--- archlinux-linux/drivers/md/raid1.c 2021-09-24 14:37:15.347771866 +0200
+++ archlinux-linux-diff/drivers/md/raid1.c 2021-09-24 14:40:02.443978319 +0200
@@ -1501,7 +1501,7 @@
* Not if there are too many, or cannot
* allocate memory, or a reader on WriteMostly
* is waiting for behind writes to flush */
- if (bitmap &&
+ if (bitmap && write_behind &&
(atomic_read(&bitmap->behind_writes)
< mddev->bitmap_info.max_write_behind) &&
!waitqueue_active(&bitmap->behind_wait)) {
diff --unified --recursive --text archlinux-linux/drivers/md/raid1.c archlinux-linux-changed/drivers/md/raid1.c
--- archlinux-linux/drivers/md/raid1.c 2021-09-24 15:43:22.842680119 +0200
+++ archlinux-linux-changed/drivers/md/raid1.c 2021-09-24 15:43:59.426142955 +0200
@@ -1329,7 +1329,6 @@
struct raid1_plug_cb *plug = NULL;
int first_clone;
int max_sectors;
- bool write_behind = false;
if (mddev_is_clustered(mddev) &&
md_cluster_ops->area_resyncing(mddev, WRITE,
@@ -1383,14 +1382,6 @@
for (i = 0; i < disks; i++) {
struct md_rdev *rdev = rcu_dereference(conf->mirrors[i].rdev);
- /*
- * The write-behind io is only attempted on drives marked as
- * write-mostly, which means we could allocate write behind
- * bio later.
- */
- if (rdev && test_bit(WriteMostly, &rdev->flags))
- write_behind = true;
-
if (rdev && unlikely(test_bit(Blocked, &rdev->flags))) {
atomic_inc(&rdev->nr_pending);
blocked_rdev = rdev;
@@ -1470,7 +1461,7 @@
* at a time and thus needs a new bio that can fit the whole payload
* this bio in page sized chunks.
*/
- if (write_behind && bitmap)
+ if (bitmap)
max_sectors = min_t(int, max_sectors,
BIO_MAX_VECS * (PAGE_SIZE >> 9));
if (max_sectors < bio_sectors(bio)) {
@@ -1501,7 +1492,7 @@
* Not if there are too many, or cannot
* allocate memory, or a reader on WriteMostly
* is waiting for behind writes to flush */
- if (bitmap &&
+ if (bitmap &&
(atomic_read(&bitmap->behind_writes)
< mddev->bitmap_info.max_write_behind) &&
!waitqueue_active(&bitmap->behind_wait)) {