On 6/20/22 15:45, Hannes Reinecke wrote: > On 6/13/22 11:06, Damien Le Moal wrote: >> On 6/13/22 17:25, John Garry wrote: > [ .. ] >>> >>> We may have 32 regular tags and 1 reserved tag for SATA. >> >> Right. But that is the messy part though. That extra 1 tag is actually not >> a tag since all internal commands are non-NCQ commands that do not need a >> tag... >> >> I am working on command duration limits support currently. This feature >> set has a new horrendous "improvement": a command can be aborted by the >> device if it fails its duration limit, but the abort is done with a good >> status + sense data available bit set so that the device queue is not >> aborted entirely like with a regular NCQ command error. >> >> For such aborted commands, the command sense data is set to >> "COMPLETED/DATA UNAVAILABLE". In this case, the host needs to go read the >> new "successful NCQ sense data log" to check that the command sense is >> indeed "COMPLETED/DATA UNAVAILABLE". And to go read that log page without >> stalling the device queue, we would need an internal NCQ (queuable) command. >> >> Currently, that is not possible to do cleanly as there are no guarantees >> we can get a free tag (there is a race between block layer tag allocation >> and libata internal tag counting). So a reserved tag for that would be >> nice. We would end up with 31 IO tags at most + 1 reserved tag for NCQ >> commands + ATA_TAG_INTERNAL for non-NCQ. That last one would be rendered >> rather useless. But that also means that we kind-of go back to the days >> when Linux showed ATA drives max QD of 31... >> >> I am still struggling with this particular use case and trying to make it >> fit with your series. Trying out different things right now. >> > Hmm. Struggling on how that is supposed to work in general. The standard monks defined it as conceptually easy: if a command completes with success and sense data available bit set, then just read that log page that has the sense data to check what happened. Very trivial in principle. But of course, this is ATA, so a mess in practice because we want to do that read log with an NCQ command to less impact on the drive performance than a regular error. Otherwise, if we simply do a regular eh, we end up with the same impact as a hard command failure. And then we end up with all these problems with tag reusing and nothing in libata allowing to do internal ncq commands. > As we're reading from a log to get the sense information I guess that > log is organized by tag index. Meaning we have to keep hold of the tag > which generated that error. Yep. This is a 1024B log which has all the sense information of for all completed NCQ commands, organized per tag. > Q1: Can we (re-) use that tag to read the log information? I thought of that. BUT: if a revalidate or regular eh is ongoing, we need to delay issuing of the NCQ read log command since eh will prevent issuing anything (there will be non-ncq commands on-going). Problem here is that delaying ncq commands means essentially doing a requeue so we need a real req/scsi req for that. Reusing the tag for a new temporary internal qc is not enough. > Q2: What do you do if all 32 command generate such an error? For that case, I can simply use the internal tag and do a non-ncq read log. That is actually the easy case ! > But really, this sounds no different from the 'classical' request sense > handling in SCSI ML. Have you considered just run with that an map > 'REQUEST SENSE' on your new NCQ Get Log page command? I am exploring the reuse of the scsi EH now. But very messy on libata side. Still no good solution. While doing that, I did discover that libata eh is very messy because of one driver only: scsi ipr. That is the only one that does not have a ->error_handler port operation. And because of that, we are stuck with lots of "old EH" ata code. So there are always 2 different eh path. Complete mess. I am trying to see if I can't convert scsi ipr to have a error_handler port operation, but I cannot test anything as I do not have the hardware. > > Cheers, > > Hannes -- Damien Le Moal Western Digital Research