Re: [PATCH v5 3/4] md: raid10 add nowait support

Vishal Verma <vverma@xxxxxxxxxxxxxxxx> · Thu, 16 Dec 2021 09:45:16 -0700

On 12/16/21 9:42 AM, Jens Axboe wrote:
On 12/15/21 5:30 PM, Vishal Verma wrote:
On 12/15/21 3:20 PM, Vishal Verma wrote:
On 12/15/21 1:42 PM, Song Liu wrote:
On Tue, Dec 14, 2021 at 10:09 PM Vishal Verma
<vverma@xxxxxxxxxxxxxxxx> wrote:
This adds nowait support to the RAID10 driver. Very similar to
raid1 driver changes. It makes RAID10 driver return with EAGAIN
for situations where it could wait for eg:

- Waiting for the barrier,
- Too many pending I/Os to be queued,
- Reshape operation,
- Discard operation.

wait_barrier() fn is modified to return bool to support error for
wait barriers. It returns true in case of wait or if wait is not
required and returns false if wait was required but not performed
to support nowait.

Signed-off-by: Vishal Verma <vverma@xxxxxxxxxxxxxxxx>
---
   drivers/md/raid10.c | 57
+++++++++++++++++++++++++++++++++++----------
   1 file changed, 45 insertions(+), 12 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index dde98f65bd04..f6c73987e9ac 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -952,11 +952,18 @@ static void lower_barrier(struct r10conf *conf)
          wake_up(&conf->wait_barrier);
   }

-static void wait_barrier(struct r10conf *conf)
+static bool wait_barrier(struct r10conf *conf, bool nowait)
   {
          spin_lock_irq(&conf->resync_lock);
          if (conf->barrier) {
                  struct bio_list *bio_list = current->bio_list;
+
+               /* Return false when nowait flag is set */
+               if (nowait) {
+ spin_unlock_irq(&conf->resync_lock);
+                       return false;
+               }
+
                  conf->nr_waiting++;
                  /* Wait for the barrier to drop.
                   * However if there are already pending
@@ -988,6 +995,7 @@ static void wait_barrier(struct r10conf *conf)
          }
          atomic_inc(&conf->nr_pending);
          spin_unlock_irq(&conf->resync_lock);
+       return true;
   }

   static void allow_barrier(struct r10conf *conf)
@@ -1101,17 +1109,25 @@ static void raid10_unplug(struct blk_plug_cb
*cb, bool from_schedule)
   static void regular_request_wait(struct mddev *mddev, struct
r10conf *conf,
                                   struct bio *bio, sector_t sectors)
   {
-       wait_barrier(conf);
+       /* Bail out if REQ_NOWAIT is set for the bio */
+       if (!wait_barrier(conf, bio->bi_opf & REQ_NOWAIT)) {
+               bio_wouldblock_error(bio);
+               return;
+       }
I think we also need regular_request_wait to return bool and handle
it properly.

Thanks,
Song

Ack, will fix it. Thanks!
Ran into this while running with io_uring. With the current v5 (raid10
patch) on top of md-next branch.
./t/io_uring -a 0 -d 256 </dev/raid10>

It didn't trigger with aio (-a 1)

[  248.128661] BUG: kernel NULL pointer dereference, address:
00000000000000b8
[  248.135628] #PF: supervisor read access in kernel mode
[  248.140762] #PF: error_code(0x0000) - not-present page
[  248.145903] PGD 0 P4D 0
[  248.148443] Oops: 0000 [#1] PREEMPT SMP NOPTI
[  248.152800] CPU: 49 PID: 9461 Comm: io_uring Kdump: loaded Not
tainted 5.16.0-rc3+ #2
[  248.160629] Hardware name: Dell Inc. PowerEdge R650xs/0PPTY2, BIOS
1.3.8 08/31/2021
[  248.168279] RIP: 0010:raid10_end_read_request+0x74/0x140 [raid10]
[  248.174373] Code: 48 60 48 8b 58 58 48 c1 e2 05 49 03 55 08 48 89 4a
10 40 84 f6 75 48 f0 41 80 4c 24 18 01 4c 89 e7 e8 e0 b8 ff ff 49 8b 4d
00 <48> 8b 83 b8 00 00 00 f0 ff 8b f0 00 00 00 0f 94 c2 a8 01 74 04 84
[  248.193120] RSP: 0018:ffffb1c38d598ce8 EFLAGS: 00010086
[  248.198344] RAX: ffff8e5da2a1a100 RBX: 0000000000000000 RCX:
ffff8e5d89747000
[  248.205479] RDX: 000000008040003a RSI: 0000000080400039 RDI:
ffff8e1e00044900
[  248.212611] RBP: ffffb1c38d598d30 R08: 0000000000000000 R09:
0000000000000001
[  248.219744] R10: ffff8e5da2a1ae00 R11: 000000411bab9000 R12:
ffff8e5da2a1ae00
[  248.226877] R13: ffff8e5d8973fc00 R14: 0000000000000000 R15:
0000000000001000
[  248.234009] FS:  00007fc26b07d700(0000) GS:ffff8e9c6e600000(0000)
knlGS:0000000000000000
[  248.242096] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  248.247843] CR2: 00000000000000b8 CR3: 00000040b25d4005 CR4:
0000000000770ee0
[  248.254973] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[  248.262107] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[  248.269240] PKRU: 55555554
[  248.271953] Call Trace:
[  248.274406]  <IRQ>
[  248.276425]  bio_endio+0xf6/0x170
[  248.279743]  blk_update_request+0x12d/0x470
[  248.283931]  ? sbitmap_queue_clear_batch+0xc7/0x110
[  248.288809]  blk_mq_end_request_batch+0x76/0x490
[  248.293429]  ? dma_direct_unmap_sg+0xdd/0x1a0
[  248.297786]  ? smp_call_function_single_async+0x46/0x70
[  248.303015]  ? mempool_kfree+0xe/0x10
[  248.306680]  ? mempool_kfree+0xe/0x10
[  248.310345]  nvme_pci_complete_batch+0x26/0xb0
[  248.314792]  nvme_irq+0x298/0x2f0
[  248.318110]  ? nvme_unmap_data+0xf0/0xf0
[  248.322038]  __handle_irq_event_percpu+0x3f/0x190
[  248.326744]  handle_irq_event_percpu+0x33/0x80
[  248.331190]  handle_irq_event+0x39/0x60
[  248.335028]  handle_edge_irq+0xbe/0x1e0
[  248.338869]  __common_interrupt+0x6b/0x110
[  248.342967]  common_interrupt+0xbd/0xe0
[  248.346808]  </IRQ>
[  248.348912]  <TASK>
[  248.351018]  asm_common_interrupt+0x1e/0x40
[  248.355206] RIP: 0010:_raw_spin_unlock_irqrestore+0x1e/0x37
[  248.360780] Code: 02 5d c3 0f 1f 44 00 00 5d c3 66 90 0f 1f 44 00 00
55 48 89 e5 c6 07 00 0f 1f 40 00 f7 c6 00 02 00 00 74 01 fb bf 01 00 00
00 <e8> ed 8e 5b ff 65 8b 05 66 7e 52 78 85 c0 74 02 5d c3 0f 1f 44 00

[  248.379525] RSP: 0018:ffffb1c3a429b958 EFLAGS: 00000206
[  248.384749] RAX: 0000000000000001 RBX: ffff8e5d8973fd08 RCX:
ffff8e5d8973fd10
[  248.391884] RDX: 0000000000000001 RSI: 0000000000000246 RDI:
0000000000000001
[  248.399017] RBP: ffffb1c3a429b958 R08: 0000000000000000 R09:
ffffb1c3a429b970
[  248.406148] R10: 0000000000000c00 R11: 0000000000000001 R12:
0000000000000001
[  248.413280] R13: 0000000000000246 R14: 0000000000000000 R15:
0000000000000003
[  248.420415]  __wake_up_common_lock+0x8a/0xc0
[  248.424686]  __wake_up+0x13/0x20
[  248.427919]  raid10_make_request+0x101/0x170 [raid10]
[  248.432971]  md_handle_request+0x179/0x1e0
[  248.437071]  ? submit_bio_checks+0x1f6/0x5a0
[  248.441345]  md_submit_bio+0x6d/0xa0
[  248.444924]  __submit_bio+0x94/0x140
[  248.448504]  submit_bio_noacct+0xe1/0x2a0
[  248.452515]  submit_bio+0x48/0x120
[  248.455923]  blkdev_direct_IO+0x220/0x540
[  248.459935]  ? __fsnotify_parent+0xff/0x330
[  248.464121]  ? __fsnotify_parent+0x10f/0x330
[  248.468393]  ? common_interrupt+0x73/0xe0
[  248.472408]  generic_file_read_iter+0xa5/0x160
[  248.476852]  blkdev_read_iter+0x38/0x70
[  248.480693]  io_read+0x119/0x420
[  248.483923]  ? sbitmap_queue_clear_batch+0xc7/0x110
[  248.488805]  ? blk_mq_end_request_batch+0x378/0x490
[  248.493684]  io_issue_sqe+0x7ec/0x19c0
[  248.497436]  ? io_req_prep+0x6a9/0xe60
[  248.501190]  io_submit_sqes+0x2a0/0x9f0
[  248.505030]  ? __fget_files+0x6a/0x90
[  248.508693]  __x64_sys_io_uring_enter+0x1da/0x8c0
[  248.513401]  do_syscall_64+0x38/0x90
[  248.516979]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  248.522033] RIP: 0033:0x7fc26b19b89d
[  248.525611] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa
48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f
05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c3 f5 0c 00 f7 d8 64 89 01 48
[  248.544360] RSP: 002b:00007fc26b07ce98 EFLAGS: 00000246 ORIG_RAX:
00000000000001aa
[  248.551925] RAX: ffffffffffffffda RBX: 00007fc26b3f2fc0 RCX:
00007fc26b19b89d
[  248.559058] RDX: 0000000000000020 RSI: 0000000000000020 RDI:
0000000000000004
[  248.566189] RBP: 0000000000000020 R08: 0000000000000000 R09:
0000000000000000
[  248.573322] R10: 0000000000000001 R11: 0000000000000246 R12:
00005623a4b7a2a0
[  248.580456] R13: 0000000000000020 R14: 0000000000000020 R15:
0000000000000020
[  248.587591]  </TASK>
Do you have:

commit 75feae73a28020e492fbad2323245455ef69d687
Author: Pavel Begunkov <asml.silence@xxxxxxxxx>
Date:   Tue Dec 7 20:16:36 2021 +0000

     block: fix single bio async DIO error handling

in your tree?

Nope. I will get it in and test. Thanks!