Re: [RFT] major libata update

Tejun Heo <htejun@xxxxxxxxx> · Wed, 17 May 2006 12:05:01 +0900

Jeff Garzik wrote:
A spurious SDB FIS updating SActive is bad news, yes.  But with a busy 
AHCI controller, perhaps sharing PCI interrupts, I think there is 
distinct potential to be flagged as a spurious interrupt, when not.

You are talking about cases where the controller raises SDB IRQ without 
actually receiving one, right?  That is certainly possible and if this 
is the case, I'm very okay with eating the IRQ.

But I'm taking a higher level view than that, from two angles:

1) I think its a waste of time to even worry about this.  We should just 
program AHCI to spec, and  let the controller and devices talk to each 
other as they will.  If there are spurious completions, that should show 
up elsewhere via tag poisoning and/or tag rotation.  Or data corruption, 
if nothing else.  We'll know, even without the potentially-questionable 
spurious interrupt detection code.

2) Given the factors mentioned above -- shared irq and busy 
multi-controller controller -- highly asynchronous conditions combine to 
create a higher probability of seeing events arrive while you're 
processing other events.  Overall, I feel that trying to accurately 
account for everything going on in silicon leads to madness and 
complexity :)  Just hand things to silicon, and trust that it either 
accurately accounts SDB FISs, or will be quite obviously broken under 
stress.

I see your point.  But I'm quite worried about this one, because

* This won't occur often and even when it occurs at the hardware level, 
it might be masked by processing delays in the kernel.  But when it 
blows, it will corrupt data silently.  We might not see such a report 
for months and even when it strikes pinpointing the origin will be 
extremely difficult.

* The danger of the race condition is not theoretical.  I think I can 
trigger it with AHCI.  AHCI delegates management of SActive register to 
the driver.

Remember the lost command completion bug in very early AHCI NCQ 
implementation?  It was caused by setting SActive after issuing the 
command.  Command completion SDB was received before SActive was cleared 
and when the IRQ handler kicks in afterward, it saw the bit which was 
set after completion and thus couldn't complete the command resulting in 
timeout.

This problem is similar except that it's the other way around.  From 
ahci_qc_issue()...

	if (qc->tf.protocol == ATA_PROT_NCQ)
		writel(1 << qc->tag, port_mmio + PORT_SCR_ACT);
	writel(1 << qc->tag, port_mmio + PORT_CMD_ISSUE);

Let's say we insert mdelay(10) between SActive setting and actual 
command issuance.  It will drastically increase the chance of receiving 
the spurious SDB FISes inbetween.  The following sequence is likely when 
such event occurs.

1. driver sets SActive bit for tag x
2. disk issues spurious SDB completion for tag x
3. ahci receives the SDB FIS, clears the corresponding SActive bit and 
raises interrupt
4. driver issues command for tag x and complete command issuance
5. IRQ handler kicks in sees that SActive for tag x is clear and 
completes the corresponding qc.
6. we now have data corruption.

I want to emphasize that SActive manipulation correctness is highly 
critical for NCQ integrity.  It's the thin thread the whole NCQ protocol 
hangs on.  Spurious SDB FIS amounts to random command completion without 
synchronization.  This is a serious problem which requires handling and 
I'm pretty sure I'm not being paranoid here.

I think detection of this problem proves the necessity for spurious 
interrupt reporting, at least until NCQ driver and hardware mature.  We 
still don't know how drives act with NCQ commands and definitely need to 
learn what sorts of culprits they have and how to work around them.  The 
spurious interrupt handling allows us to detect and diagnose such 
problems much better.

Am I making any sense?  I really wish I could write better.  Please give 
it some thoughts even if my writing sucks.  :-)

--
tejun
-
: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html