On 6/13/22 17:25, John Garry wrote: > On 13/06/2022 08:01, Damien Le Moal wrote: >> On 6/9/22 19:29, John Garry wrote: >>> From: Hannes Reinecke <hare@xxxxxxx> >>> >>> Quite some drivers are using management commands internally, which >>> typically use the same hardware tag pool (ie they are being allocated >>> from the same hardware resources) as the 'normal' I/O commands. >>> These commands are set aside before allocating the block-mq tag bitmap, >>> so they'll never show up as busy in the tag map. >>> The block-layer, OTOH, already has 'reserved_tags' to handle precisely >>> this situation. >>> So this patch adds a new field 'nr_reserved_cmds' to the SCSI host >>> template to instruct the block layer to set aside a tag space for these >>> management commands by using reserved tags. >>> >>> Signed-off-by: Hannes Reinecke <hare@xxxxxxx> >>> Signed-off-by: John Garry <john.garry@xxxxxxxxxx> >>> --- >>> drivers/scsi/hosts.c | 3 +++ >>> drivers/scsi/scsi_lib.c | 6 +++++- >>> include/scsi/scsi_host.h | 22 +++++++++++++++++++++- >>> 3 files changed, 29 insertions(+), 2 deletions(-) >>> >>> diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c >>> index 8352f90d997d..27296addaf63 100644 >>> --- a/drivers/scsi/hosts.c >>> +++ b/drivers/scsi/hosts.c >>> @@ -474,6 +474,9 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize) >>> if (sht->virt_boundary_mask) >>> shost->virt_boundary_mask = sht->virt_boundary_mask; >>> >>> + if (sht->nr_reserved_cmds) >>> + shost->nr_reserved_cmds = sht->nr_reserved_cmds; >>> + >>> device_initialize(&shost->shost_gendev); >>> dev_set_name(&shost->shost_gendev, "host%d", shost->host_no); >>> shost->shost_gendev.bus = &scsi_bus_type; >>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c >>> index 6ffc9e4258a8..f6e53c6d913c 100644 >>> --- a/drivers/scsi/scsi_lib.c >>> +++ b/drivers/scsi/scsi_lib.c >>> @@ -1974,8 +1974,12 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost) >>> else >>> tag_set->ops = &scsi_mq_ops_no_commit; >>> tag_set->nr_hw_queues = shost->nr_hw_queues ? : 1; >>> + >>> tag_set->nr_maps = shost->nr_maps ? : 1; >>> - tag_set->queue_depth = shost->can_queue; >>> + tag_set->queue_depth = >>> + shost->can_queue + shost->nr_reserved_cmds; >>> + tag_set->reserved_tags = shost->nr_reserved_cmds; >>> + >>> tag_set->cmd_size = cmd_size; >>> tag_set->numa_node = dev_to_node(shost->dma_dev); >>> tag_set->flags = BLK_MQ_F_SHOULD_MERGE; >>> diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h >>> index 59aef1f178f5..149dcbd4125e 100644 >>> --- a/include/scsi/scsi_host.h >>> +++ b/include/scsi/scsi_host.h >>> @@ -366,10 +366,19 @@ struct scsi_host_template { >>> /* >>> * This determines if we will use a non-interrupt driven >>> * or an interrupt driven scheme. It is set to the maximum number >>> - * of simultaneous commands a single hw queue in HBA will accept. >>> + * of simultaneous commands a single hw queue in HBA will accept >>> + * excluding internal commands. >>> */ >>> int can_queue; >>> >>> + /* >>> + * This determines how many commands the HBA will set aside >>> + * for internal commands. This number will be added to >>> + * @can_queue to calcumate the maximum number of simultaneous >> > > Hi Damien, > >> s/calcumate/calculate >> >> But this is weird. For SATA, can_queue is 32. Having reserved commands, >> that number needs to stay the same. > > It does. > >> We cannot have more than 32 tags. > > We may have 32 regular tags and 1 reserved tag for SATA. Right. But that is the messy part though. That extra 1 tag is actually not a tag since all internal commands are non-NCQ commands that do not need a tag... I am working on command duration limits support currently. This feature set has a new horrendous "improvement": a command can be aborted by the device if it fails its duration limit, but the abort is done with a good status + sense data available bit set so that the device queue is not aborted entirely like with a regular NCQ command error. For such aborted commands, the command sense data is set to "COMPLETED/DATA UNAVAILABLE". In this case, the host needs to go read the new "successful NCQ sense data log" to check that the command sense is indeed "COMPLETED/DATA UNAVAILABLE". And to go read that log page without stalling the device queue, we would need an internal NCQ (queuable) command. Currently, that is not possible to do cleanly as there are no guarantees we can get a free tag (there is a race between block layer tag allocation and libata internal tag counting). So a reserved tag for that would be nice. We would end up with 31 IO tags at most + 1 reserved tag for NCQ commands + ATA_TAG_INTERNAL for non-NCQ. That last one would be rendered rather useless. But that also means that we kind-of go back to the days when Linux showed ATA drives max QD of 31... I am still struggling with this particular use case and trying to make it fit with your series. Trying out different things right now. > >> I think keeping can_queue as the max queue depth with at most >> nr_reserved_cmds tags reserved is better. > > Maybe the wording in the comment can be improved as it originally > focused on SAS HBAs where there are no special rules for tagset depth or > how the tagset should be carved up to handle regular and reserved commands. Indeed. And that would be for HBAs that do *not* use libsas/libata. Otherwise, the NCQ vs non-NCQ reserved tag mess is there. > > Thanks, > John > >> >>> + * commands sent to the host. >>> + */ >>> + int nr_reserved_cmds; >>> + >>> /* >>> * In many instances, especially where disconnect / reconnect are >>> * supported, our host also has an ID on the SCSI bus. If this is >>> @@ -602,6 +611,11 @@ struct Scsi_Host { >>> unsigned short max_cmd_len; >>> >>> int this_id; >>> + >>> + /* >>> + * Number of commands this host can handle at the same time. >>> + * This excludes reserved commands as specified by nr_reserved_cmds. >>> + */ >>> int can_queue; >>> short cmd_per_lun; >>> short unsigned int sg_tablesize; >>> @@ -620,6 +634,12 @@ struct Scsi_Host { >>> */ >>> unsigned nr_hw_queues; >>> unsigned nr_maps; >>> + >>> + /* >>> + * Number of reserved commands to allocate, if any. >>> + */ >>> + unsigned nr_reserved_cmds; >>> + >>> unsigned active_mode:2; >>> >>> /* >> >> > -- Damien Le Moal Western Digital Research