Re: [PATCH RFC v3 22/41] block: implement persistent commands

Hannes Reinecke <hare@xxxxxxx> · Sat, 2 May 2020 14:11:58 +0200

On 5/1/20 6:59 AM, Bart Van Assche wrote:
On 2020-04-30 06:18, Hannes Reinecke wrote:
Some LLDDs implement event handling by sending a command to the
firmware, which then will be completed once the firmware wants
to register an event.
      ^^^^^^^^
      report?

So worst case a command is being sent to the firmware then the
                                                         ^^^^
                                                         when?
driver initializes, and will be returned once the driver unloads.
To avoid these commands to block the queues during freezing or
quiescing this patch implements support for 'persistent' commands,
which will be excluded from blk_queue_enter() and blk_queue_exit()
calls.

How is it prevented that the SCSI timeout handler is activated for
persistent commands?

Signed-off-by: Hannes Reinecke <hare@xxxxxxx>
---
  block/blk-mq.c            | 12 +++++++++---
  include/linux/blk-mq.h    |  2 ++
  include/linux/blk_types.h |  4 ++++
  3 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 44482aaed11e..402cf104d183 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -402,9 +402,14 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
  {
  	struct blk_mq_alloc_data alloc_data = { .flags = flags, .cmd_flags = op };
  	struct request *rq;
-	int ret;
+	int ret = 0;
  
-	ret = blk_queue_enter(q, flags);
+	if (flags & BLK_MQ_REQ_PERSISTENT) {
+		if (blk_queue_dying(q))
+			ret = -ENODEV;
+		alloc_data.cmd_flags |= REQ_PERSISTENT;
+	} else
+		ret = blk_queue_enter(q, flags);
  	if (ret)
  		return ERR_PTR(ret);
  

I think that not calling blk_queue_enter() for persistent commands means
opening a giant can of worms. There is quite some code in the block
layer that assumes that neither .queue_rq() nor the request completion
code will be called if q_usage_counter == 0. Skipping the
blk_queue_enter() call for persistent commands breaks that assumption. I
think we need a better solution.

Well, yeah, maybe.
My aim here is that _all_ I/O requiring a tag from the hardware will be 
tracked by the blocklayer tagset. Only that will give the block-layer 
accurate information about outstanding commands, such that the ongoing 
CPU hotplug discussion can make the correct decisions and implement 
functions really covering all outstanding I/O.
It also allows us to use the scsi_host_busy_iter() functions within the 
driver, and will get rid of the hand-crafted iterations the driver has 
to do now.

It worked reasonably well, until I encountered the infamous AEN 
commands, which actually require the opposite: _not_ to be tracked by 
the block layer at all, as the commands themselves are just placeholders
to be returned by the firmware once an event occurs.
(And yes, I _do_ think this is a quite dangerous operation, because I 
can't quite see how one could reliably return this command in case of a 
firmware crash ...)
(But anyhow, that's how the firmware is written and we have to live with 
it.)

So I implemented this approach, to have tags which are ignored by the 
block layer. But I have to admit that this approach relies on quite some 
assumptions (like these tags are never actually submitted to the 
blocklayer itself, are never started etc), none of which are spelled out 
clearly (yet).
An alternative approach would be to arbitrary decrease the tagset size 
by one (eg by shifting the tags by one), and use the free tag for AENs).
That would have to be coded within the driver, though.

If that's a solution which you like better I could give it a go.

Cheers,

Hannes
--
Dr. Hannes Reinecke            Teamlead Storage & Networking
hare@xxxxxxx                               +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer