Re: [RFC PATCH v4 4/4] md: raid456 add nowait support

Vishal Verma <vverma@xxxxxxxxxxxxxxxx> · Mon, 13 Dec 2021 15:43:40 -0700

On 12/12/21 10:56 PM, Song Liu wrote:
On Fri, Dec 10, 2021 at 10:26 AM Vishal Verma <vverma@xxxxxxxxxxxxxxxx> wrote:

On 12/9/21 7:16 PM, Song Liu wrote:
On Wed, Nov 10, 2021 at 10:15 AM Vishal Verma <vverma@xxxxxxxxxxxxxxxx> wrote:
Returns EAGAIN in case the raid456 driver would block
waiting for situations like:

    - Reshape operation,
    - Discard operation.

Signed-off-by: Vishal Verma <vverma@xxxxxxxxxxxxxxxx>
---
   drivers/md/raid5.c | 14 ++++++++++++++
   1 file changed, 14 insertions(+)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 9c1a5877cf9f..fa64ee315241 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -5710,6 +5710,11 @@ static void make_discard_request(struct mddev *mddev, struct bio *bi)
                  int d;
          again:
                  sh = raid5_get_active_stripe(conf, logical_sector, 0, 0, 0);
+               /* Bail out if REQ_NOWAIT is set */
+               if (bi->bi_opf & REQ_NOWAIT) {
+                       bio_wouldblock_error(bi);
+                       return;
+               }
This is not right. raid5_get_active_stripe() gets refcount on the sh,
we cannot simply
return here. I think we need the logic after raid5_release_stripe()
and before schedule().

                  prepare_to_wait(&conf->wait_for_overlap, &w,
                                  TASK_UNINTERRUPTIBLE);
                  set_bit(R5_Overlap, &sh->dev[sh->pd_idx].flags);
@@ -5820,6 +5825,15 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
          bi->bi_next = NULL;

          md_account_bio(mddev, &bi);
+       /* Bail out if REQ_NOWAIT is set */
+       if (bi->bi_opf & REQ_NOWAIT &&
+           conf->reshape_progress != MaxSector &&
+           mddev->reshape_backwards
+           ? logical_sector < conf->reshape_safe
+           : logical_sector >= conf->reshape_safe) {
+               bio_wouldblock_error(bi);
+               return true;
+       }
This is also problematic, and is the trigger of those error messages.
We only want to trigger -EAGAIN when logical_sector is between
reshape_progress and reshape_safe.

Just to clarify, did you mean doing something like:
if (bi->bi_opf & REQ_NOWAIT &&
+           conf->reshape_progress != MaxSector &&
+           (mddev->reshape_backwards
+           ? (logical_sector > conf->reshape_progress && logical_sector < conf->reshape_safe)
+           : logical_sector >= conf->reshape_safe)) {
I think this should be
   :   (logical_sector >= conf->reshape_safe && logical_sector <
conf->reshape_progress)


if (bi->bi_opf & REQ_NOWAIT &&
                conf->reshape_progress != MaxSector &&
                (mddev->reshape_backwards
                ? (logical_sector > conf->reshape_progress && 
logical_sector <= conf->reshape_safe)
                : (logical_sector >= conf->reshape_safe && 
logical_sector < conf->reshape_progress))) {
                        bio_wouldblock_error(bi);
                        return true;
        }

After making this change along with other changes, I ran some tests with 
100% reads, 70%read30%writes and 100% writes on a clean raid5 array.

Unfortunately, I ran into this following task hung with 100% writes 
(with both libaio and io_uring):

[21876.856692] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[21876.864518] task:md5_raid5       state:D stack:    0 pid:11675 
ppid:     2 flags:0x00004000
[21876.864522] Call Trace:
[21876.864526]  __schedule+0x2d4/0x970
[21876.864532]  ? wbt_cleanup_cb+0x20/0x20
[21876.864535]  schedule+0x4e/0xb0
[21876.864537]  io_schedule+0x3f/0x70
[21876.864539]  rq_qos_wait+0xb9/0x130
[21876.864542]  ? sysv68_partition+0x280/0x280
[21876.864543]  ? wbt_cleanup_cb+0x20/0x20
[21876.864545]  wbt_wait+0x92/0xc0
[21876.864546]  __rq_qos_throttle+0x25/0x40
[21876.864548]  blk_mq_submit_bio+0xc6/0x5d0
[21876.864551]  ? submit_bio_checks+0x39e/0x5f0
[21876.864554]  __submit_bio+0x1bc/0x1d0
[21876.864555]  ? kmem_cache_free+0x378/0x3c0
[21876.864558]  ? mempool_free_slab+0x17/0x20
[21876.864562]  submit_bio_noacct+0x256/0x2a0
[21876.864565]  0xffffffffc01fa6d9
[21876.864568]  ? 0xffffffffc01f5d01
[21876.864569]  raid5_get_active_stripe+0x16c0/0x3e00 [raid456]
[21876.864571]  ? __wake_up_common_lock+0x8a/0xc0
[21876.864575]  raid5_get_active_stripe+0x2839/0x3e00 [raid456]
[21876.864577]  raid5_get_active_stripe+0x2d6e/0x3e00 [raid456]
[21876.864579]  md_thread+0xae/0x170
[21876.864581]  ? wait_woken+0x60/0x60
[21876.864582]  ? md_start_sync+0x60/0x60
[21876.864584]  kthread+0x127/0x150
[21876.864586]  ? set_kthread_struct+0x40/0x40
[21876.864588]  ret_from_fork+0x1f/0x30

+               bio_wouldblock_error(bi);
+               return true;
+

Please let me know if these make sense.

Thanks,
Song


Makes sense. Thanks for your feedback. I'll incorporate it and test.
When testing, please make sure we hit all different conditions with REQ_NOWAIT.

Thanks,
Song