On 12/6/21 6:15 PM, John Garry wrote:
On 28/11/2021 10:36, Hannes Reinecke wrote:
* Allocates a SCSI command for internal LLDD use.
+ */
+struct scsi_cmnd *scsi_get_internal_cmd(struct scsi_device *sdev,
+ int data_direction, bool nowait)
+{
+ struct request *rq;
+ struct scsi_cmnd *scmd;
+ blk_mq_req_flags_t flags = 0;
+ int op;
+
+ if (nowait)
+ flags |= BLK_MQ_REQ_NOWAIT;
+ op = (data_direction == DMA_TO_DEVICE) ?
+ REQ_OP_DRV_OUT : REQ_OP_DRV_IN;
+ rq = blk_mq_alloc_request(sdev->request_queue, op, flags);
+ if (IS_ERR(rq))
+ return NULL;
+ scmd = blk_mq_rq_to_pdu(rq);
+ scmd->device = sdev;
+ return scmd;
+}
+EXPORT_SYMBOL_GPL(scsi_get_internal_cmd);
So there are a couple of generally-accepted grievances about this
approach:
a. we're being allocated a scsi_cmnd, but not using what is being
allocated as a scsi_cmnd, but rather just a holder as a reference to an
allocated tag
b. we're being allocated a request, which is not being sent through the
block layer*
And while being true in general, it does make some assumptions:
- Reserved commands will never being sent via the block layer
- None of the drivers will need to use the additional 'scsi_cmnd'
payload.
Here I'm not sure if this is true in general.
While it doesn't seem to be necessary to send reserved commands via the
block layer (ie calling 'queue_rq' on them), we shouldn't exclude the
possibility.
Agreed
Didn't we speak about that in the context of converting libata?
We did discuss libata before, but I'm not sure on the context you mean.
One thing that I know is that libata-core has "internal" commands in
ata_exec_internal(). I could not see how that function could be
converted to use queue_rq. The problem is that it calls ->qc_issue()
with IRQs disabled, which is not permitted for calling blk_execute_rq()
instead.
We're conflating two issues here.
The one issue is to use 'reserved' tags (as defined by the block layer)
to allocate commands which are internal within some drivers (like hpsa
etc). That is what my patchset does.
As these commands are internal within specific drivers these commands
will never show up in the upper layers, precisely because they are internal.
The other issue is to allow these 'reserved' tags (and the attached
requests/commands) to be routed via the normal block-layer execution
path. This is _not_ part of the above patchset, as that just deals with
driver internal commands and will never execute ->queue_rq.
For that we would need an additional patchset, on top of the first one.
And I have some driver conversions queued (fnic in particular), which
encapsulate everything into a scsi command.
It just seems to me that what the block layer is providing is not
suitable.
How about these:
a. allow block driver to specify size of reserved request PDU separately
to regular requests, so we can use something like this for rsvd
commands:
struct scsi_rsvd_cmnd {
struct scsi_device *sdev;
}
And fix up SCSI iter functions and LLDs to deal with it.
That's what Bart suggested a while back,
I don't recall that one.
but then we have to problem
that the reserved tags are completed with the same interrupt routine,
and_that_ currently assumes that everything is a scsi command.
I think that any driver which uses reserved commands needs to be thought
that not all commands are "real" scsi commands, i.e. we don't call
scsi_done() in the LLD ISR always. As such, they should be able to deal
with something like struct scsi_rsvd_cmnd.
See above. My patchset is strictly for driver internal commands, which
will never be send nor completed like 'real' commands.
And the main point (well, the 'other' main point aside from not having
to allocate a tag yourself) is that the driver can be _simplified_, as
now every tag references to the _same_ structure.
If we start using different structure we'll lose that ability
completely, and really haven't gained that much.
BTW, for this current series, please ensure that we can't call
scsi_host_complete_all_commands() which could iter reserved tags, as we
call stuff like scsi_done() there. I don't think it's an issue here, but
just thought that it was worth mentioning.
See above. These are driver internal commands, for which scsi_done() is
never called.
I deliberately did _not_ add checks to scsi_done() or queue_rq(), as
there presumably will be an additional patch which allows precisely
this, eg when converting libsas.
Trying to fix up that assumption would mean to audit the midlayer
(scmd->device is a particular common pattern),_and_ all drivers wanting
to make use of reserved commands.
For me that's too high an risk to introduce errors; audits are always
painful and error-prone.
b. provide block layer API to provide just same as is returned from
blk_mq_unique_tag(), but no request is provided. This just gives what we
need but would be disruptive in scsi layer and LLDs.
Having looked at the block layer and how tags are allocated I found it
too deeply interlinked with the request queue and requests in general.
They are indeed interlinked in the block layer, but we don't need expose
requests or anything else.
Correct. And this is one of the drawbacks of this approach, in that
we're always allocating a 'struct request' and a 'struct scsi_cmnd'
payload even if we don't actually use them.
But then again, we _currently_ don't use them.
If we ever want to sent these 'reserved' commands via queue_rqs() and be
able to call 'scsi_done()' on them (again, the libsas case) then we need
these payloads.
Such an interface could just be a wrapper for
blk_mq_alloc_request()+_start_request().
I do agree that currently I don't start requests, and consequently they
won't show up in any iterators.
But then (currently) it doesn't matter, as none of the iterators in any
of the drivers I've been converting needed to look at those requests.
Plus I've suggested that with a previous patchset, which got vetoed by
Bart as he couldn't use such an API for UFS.
>> c. as alternative to b., send all rsvd requests through the block
layer,
but can be very difficult+disruptive for users
And, indeed, not possible when we will need to send these requests
during error handling, where the queue might be blocked/frozen/quiesced
precisely because we are in error handling ...
If we send for the host request queue, would it ever be
blocked/frozen/quiesced?
Possibly not. But be aware that 'reserved' tags is actually independent
on the host request queue; it's perfectly possible to use 'reserved'
tags without a host request queue. Again, fnic is such an example.
*For polling rsvd commands on a poll queue (which we will need for
hisi_sas driver and maybe others for poll mode support), we would need
to send the request through the block layer, but block layer polling
requires a request with a bio, which is a problem.
Allocating a bio is a relatively trivial task.
So do you suggest a dummy bio for that request? I hacked something
locally to get it to work as a PoC, but no idea on a real solution yet.
We're allocating bios for all kind of purposes, even 'dummy' bios in the
case of bio_clone() or bio_split(). So that's nothing new, and should be
relatively easy.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@xxxxxxx +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer