Re: [PATCH] block: free sched's request pool in blk_cleanup_queue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Tested-by: Yi Zhang <yi.zhang@xxxxxxxxxx>

This patch fixed bellow issue which triggered by blktests block/006, thanks.

Kernel 5.2.0-rc3 on an x86_64

[  567.066143] run blktests block/006 at 2019-06-05 03:18:04
[  567.089457] null: module loaded
[  593.937235] BUG: unable to handle page fault for address: ffffffffc0738110
[  593.938442] #PF: supervisor read access in kernel mode
[  593.939230] #PF: error_code(0x0000) - not-present page
[  593.940011] PGD 11760f067 P4D 11760f067 PUD 117611067 PMD 12b61f067 PTE 0
[  593.941048] Oops: 0000 [#1] SMP PTI
[  593.941607] CPU: 0 PID: 1106 Comm: kworker/0:0 Not tainted 5.2.0-rc3 #1
[  593.942607] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[  593.943511] Workqueue: events __blk_release_queue
[  593.944240] RIP: 0010:blk_mq_free_rqs+0x21/0xd0
[  593.944930] Code: c6 e8 83 f0 ff ff eb 9f 90 0f 1f 44 00 00 41 56 41 55 41 54 55 48 89 f5 53 48 83 be 90 00 00 00 00 74 56 48 8b 47 38 49 89 fd <48> 83 78 50 00 74 48 8b 06 85 c0 74 42 41 89 d6 31 db 48 8b 85 98
[  593.947773] RSP: 0018:ffffa21300e7fde8 EFLAGS: 00010286
[  593.948562] RAX: ffffffffc07380c0 RBX: 0000000000000000 RCX: 0000000000002ab8 [  593.949737] RDX: 0000000000000000 RSI: ffff9296971089c0 RDI: ffff9296975fb438 [  593.950811] RBP: ffff9296971089c0 R08: 000000000002f0e0 R09: ffffffffb6427890 [  593.951878] R10: ffffd41084a3fd80 R11: 0000000000000001 R12: ffff9296a91e58b0 [  593.952947] R13: ffff9296975fb438 R14: ffff9296ab5b2600 R15: ffff9296a91e6080 [  593.954011] FS:  0000000000000000(0000) GS:ffff9296bba00000(0000) knlGS:0000000000000000
[  593.955252] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  593.956147] CR2: ffffffffc0738110 CR3: 000000011760a003 CR4: 00000000003606f0 [  593.957249] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [  593.958354] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  593.959452] Call Trace:
[  593.959877]  blk_mq_sched_tags_teardown+0x40/0x70
[  593.960628]  blk_mq_exit_sched+0x88/0xa0
[  593.961246]  elevator_exit+0x30/0x50
[  593.961817]  __blk_release_queue+0x5c/0x100
[  593.962497]  process_one_work+0x1a1/0x3a0
[  593.963138]  worker_thread+0x30/0x380
[  593.963714]  ? pwq_unbound_release_workfn+0xd0/0xd0
[  593.964479]  kthread+0x112/0x130
[  593.964992]  ? __kthread_parkme+0x70/0x70
[  593.965641]  ret_from_fork+0x35/0x40
[  593.966215] Modules linked in: sunrpc snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_hda_codec nfit snd_hda_core libnvdimm snd_hwdep snd_seq crct10dif_pclmul snd_seq_device crc32_pclmul snd_pcm ghash_clmulni_intel snd_timer snd pcspkr joydev virtio_balloon i2c_piix4 soundcore ip_tables xfs libcrc32c qxl drm_kms_helper ata_generic syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix libata virtio_net net_failover crc32c_intel serio_raw virtio_blk virtio_console failover dm_mirror dm_region_hash dm_log dm_mod [last unloaded: null_blk]
[  593.973735] CR2: ffffffffc0738110
[  593.974270] ---[ end trace 782d8ae6cfec57e7 ]---
[  593.974983] RIP: 0010:blk_mq_free_rqs+0x21/0xd0
[  593.975691] Code: c6 e8 83 f0 ff ff eb 9f 90 0f 1f 44 00 00 41 56 41 55 41 54 55 48 89 f5 53 48 83 be 90 00 00 00 00 74 56 48 8b 47 38 49 89 fd <48> 83 78 50 00 74 48 8b 06 85 c0 74 42 41 89 d6 31 db 48 8b 85 98
[  593.978548] RSP: 0018:ffffa21300e7fde8 EFLAGS: 00010286
[  593.979466] RAX: ffffffffc07380c0 RBX: 0000000000000000 RCX: 0000000000002ab8 [  593.980578] RDX: 0000000000000000 RSI: ffff9296971089c0 RDI: ffff9296975fb438 [  593.981677] RBP: ffff9296971089c0 R08: 000000000002f0e0 R09: ffffffffb6427890 [  593.982780] R10: ffffd41084a3fd80 R11: 0000000000000001 R12: ffff9296a91e58b0 [  593.983883] R13: ffff9296975fb438 R14: ffff9296ab5b2600 R15: ffff9296a91e6080 [  593.984982] FS:  0000000000000000(0000) GS:ffff9296bba00000(0000) knlGS:0000000000000000
[  593.986239] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  593.987138] CR2: ffffffffc0738110 CR3: 000000011760a003 CR4: 00000000003606f0 [  593.988241] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [  593.989344] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  593.990447] Kernel panic - not syncing: Fatal exception
[  593.992292] Kernel Offset: 0x35000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[  593.994147] ---[ end Kernel panic - not syncing: Fatal exception ]---

On 6/4/19 9:08 PM, Ming Lei wrote:
In theory, IO scheduler belongs to request queue, and the request pool
of sched tags belongs to the request queue too.

However, the current tags allocation interfaces are re-used for both
driver tags and sched tags, and driver tags is definitely host wide,
and doesn't belong to any request queue, same with its request pool.
So we need tagset instance for freeing request of sched tags.

Meantime, blk_mq_free_tag_set() often follows blk_cleanup_queue() in case
of non-BLK_MQ_F_TAG_SHARED, this way requires that request pool of sched
tags to be freed before calling blk_mq_free_tag_set().

Commit 47cdee29ef9d94e ("block: move blk_exit_queue into __blk_release_queue")
moves blk_exit_queue into __blk_release_queue for simplying the fast
path in generic_make_request(), then causes oops during freeing requests
of sched tags in __blk_release_queue().

Fix the above issue by move freeing request pool of sched tags into
blk_cleanup_queue(), this way is safe becasue queue has been frozen and no any
in-queue requests at that time. Freeing sched tags has to be kept in queue's
release handler becasue there might be un-completed dispatch activity
which might refer to sched tags.

Cc: Bart Van Assche <bvanassche@xxxxxxx>
Cc: Christoph Hellwig <hch@xxxxxx>
Fixes: 47cdee29ef9d94e485eb08f962c74943023a5271 ("block: move blk_exit_queue into __blk_release_queue")
Reported-by: kernel test robot <rong.a.chen@xxxxxxxxx>
Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx>
---
  block/blk-core.c     | 13 +++++++++++++
  block/blk-mq-sched.c | 30 +++++++++++++++++++++++++++---
  block/blk-mq-sched.h |  1 +
  block/blk-sysfs.c    |  2 +-
  block/blk.h          | 10 +++++++++-
  block/elevator.c     |  2 +-
  6 files changed, 52 insertions(+), 6 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index ee1b35fe8572..8340f69670d8 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -320,6 +320,19 @@ void blk_cleanup_queue(struct request_queue *q)
  	if (queue_is_mq(q))
  		blk_mq_exit_queue(q);
+ /*
+	 * In theory, request pool of sched_tags belongs to request queue.
+	 * However, the current implementation requires tag_set for freeing
+	 * requests, so free the pool now.
+	 *
+	 * Queue has become frozen, there can't be any in-queue requests, so
+	 * it is safe to free requests now.
+	 */
+	mutex_lock(&q->sysfs_lock);
+	if (q->elevator)
+		blk_mq_sched_free_requests(q);
+	mutex_unlock(&q->sysfs_lock);
+
  	percpu_ref_exit(&q->q_usage_counter);
/* @q is and will stay empty, shutdown and put */
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 74c6bb871f7e..500cb04901cc 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -475,14 +475,18 @@ static int blk_mq_sched_alloc_tags(struct request_queue *q,
  	return ret;
  }
+/* called in queue's release handler, tagset has gone away */
  static void blk_mq_sched_tags_teardown(struct request_queue *q)
  {
-	struct blk_mq_tag_set *set = q->tag_set;
  	struct blk_mq_hw_ctx *hctx;
  	int i;
- queue_for_each_hw_ctx(q, hctx, i)
-		blk_mq_sched_free_tags(set, hctx, i);
+	queue_for_each_hw_ctx(q, hctx, i) {
+		if (hctx->sched_tags) {
+			blk_mq_free_rq_map(hctx->sched_tags);
+			hctx->sched_tags = NULL;
+		}
+	}
  }
int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
@@ -523,6 +527,7 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
  			ret = e->ops.init_hctx(hctx, i);
  			if (ret) {
  				eq = q->elevator;
+				blk_mq_sched_free_requests(q);
  				blk_mq_exit_sched(q, eq);
  				kobject_put(&eq->kobj);
  				return ret;
@@ -534,11 +539,30 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
  	return 0;
err:
+	blk_mq_sched_free_requests(q);
  	blk_mq_sched_tags_teardown(q);
  	q->elevator = NULL;
  	return ret;
  }
+/*
+ * called in either blk_queue_cleanup or elevator_switch, tagset
+ * is required for freeing requests
+ */
+void blk_mq_sched_free_requests(struct request_queue *q)
+{
+	struct blk_mq_hw_ctx *hctx;
+	int i;
+
+	lockdep_assert_held(&q->sysfs_lock);
+	WARN_ON(!q->elevator);
+
+	queue_for_each_hw_ctx(q, hctx, i) {
+		if (hctx->sched_tags)
+			blk_mq_free_rqs(q->tag_set, hctx->sched_tags, i);
+	}
+}
+
  void blk_mq_exit_sched(struct request_queue *q, struct elevator_queue *e)
  {
  	struct blk_mq_hw_ctx *hctx;
diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h
index c7bdb52367ac..3cf92cbbd8ac 100644
--- a/block/blk-mq-sched.h
+++ b/block/blk-mq-sched.h
@@ -28,6 +28,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx);
int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e);
  void blk_mq_exit_sched(struct request_queue *q, struct elevator_queue *e);
+void blk_mq_sched_free_requests(struct request_queue *q);
static inline bool
  blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio)
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 75b5281cc577..977c659dcd18 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -850,7 +850,7 @@ static void blk_exit_queue(struct request_queue *q)
  	 */
  	if (q->elevator) {
  		ioc_clear_queue(q);
-		elevator_exit(q, q->elevator);
+		__elevator_exit(q, q->elevator);
  		q->elevator = NULL;
  	}
diff --git a/block/blk.h b/block/blk.h
index 91b3581b7c7a..7814aa207153 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -6,6 +6,7 @@
  #include <linux/blk-mq.h>
  #include <xen/xen.h>
  #include "blk-mq.h"
+#include "blk-mq-sched.h"
/* Max future timer expiry for timeouts */
  #define BLK_MAX_TIMEOUT		(5 * HZ)
@@ -176,10 +177,17 @@ void blk_insert_flush(struct request *rq);
  int elevator_init_mq(struct request_queue *q);
  int elevator_switch_mq(struct request_queue *q,
  			      struct elevator_type *new_e);
-void elevator_exit(struct request_queue *, struct elevator_queue *);
+void __elevator_exit(struct request_queue *, struct elevator_queue *);
  int elv_register_queue(struct request_queue *q);
  void elv_unregister_queue(struct request_queue *q);
+static inline void elevator_exit(struct request_queue *q,
+		struct elevator_queue *e)
+{
+	blk_mq_sched_free_requests(q);
+	__elevator_exit(q, e);
+}
+
  struct hd_struct *__disk_get_part(struct gendisk *disk, int partno);
#ifdef CONFIG_FAIL_IO_TIMEOUT
diff --git a/block/elevator.c b/block/elevator.c
index ec55d5fc0b3e..2f17d66d0e61 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -178,7 +178,7 @@ static void elevator_release(struct kobject *kobj)
  	kfree(e);
  }
-void elevator_exit(struct request_queue *q, struct elevator_queue *e)
+void __elevator_exit(struct request_queue *q, struct elevator_queue *e)
  {
  	mutex_lock(&e->sysfs_lock);
  	if (e->type->ops.exit_sched)



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux