Re: [RFT] major libata update

Jeff Garzik <jeff@xxxxxxxxxx> · Mon, 22 May 2006 03:19:19 -0400

Tejun Heo wrote:
Jeff Garzik wrote:
Tejun Heo wrote:
Jeff Garzik wrote:
Jeff Garzik wrote:
Tejun Heo wrote:
Hmmm.. The drive is issuing SDB FIS which completes already 
completed tags.  This could be dangerous.  Depending on timing, it 
might end up finishing a command which occupied the slot which 
hasn't been processed yet.  If a drive does this, NCQ shouldn't be 
enabled for it.  Can you post full boot dmesg?

I'm not sure the data supports that conclusion?  PORT_IRQ_SDB_FIS 
is quite normal and expected during NCQ operation, if that 
interrupt is enabled.  Just normal SDB:Entry and SDB:SetIntr states.

Strike that last part:  PORT_IRQ_SDB_FIS will appear, as with other 
status bits, even if the enable bit is not set.

So, you'll see that whenever you get an SDB FIS during normal 
operation.

The problem is with the second dword.  Here are some of spurious SDB 
FISes Ric's AHCI was receiving.

004040a1:10000000
004040a1:00000020
004040a1:00000080

If the second dword were all zero, it's simply SDB FIS turning on IRQ 
(bit 14 of the first dword) and there's nothing to worry about. 
However, all those spurious SDBs have one bit set in the second dword 
- meaning the SDB completes the corresponding tag, but the tag isn't 
active when those SDBs are received.

This is okay as long as the controller thinks the tags are unoccupied 
when those SDBs are received, but it's not something which can be 
guaranteed.  NCQ command synchronization depends on devices not 
completing the same commands more than once.

The duplicate completions might be okay if the drive guarantees it 
doesn't send it if it loses to command issuance.  e.g.

1. drive sends completion for tag x
2. drive shortly schedules another completion for tag x (spurious)
3. ahci/driver complete tag x
4. ahci/driver issues tag x
5. drive receives command for tag x before sending the spurious 
completion and determines not to send the spurious completion. (not 
very likely)

If above is true, the drive might be okay, but nobody can guarantee 
how  various controllers react.  It depends on how controllers manage 
SActive (when to turn bits on).  At any rate, it's dangerous IMHO.

If the silicon is screwing up SActive bits, then we have bigger 
problems than spurious interrupts.

So, the typical policy of Internet servers applies here:  "be liberal 
in what you accept."  For smart controllers like AHCI, we will simply 
set the desired IRQ mask, then happily receive and ack events anytime 
the controller decides to raise them.  If the controller decides to 
send us a no-op, don't worry about it.  This is particularly true when 
we turn on Command Coalescing, where we'll have a run of work 
initiated [sometimes] by an internal timer, rather than an actual FIS 
reception.

I wish I could explain it better.  This is a clear protocol violation 
from the drive.  Depending on specific implementation of the drive and 
the controller, it can result in completion of command which is not 
processed yet (data corruption!).

I quite understand the implications.  My argument comes from a different 
angle:  I don't feel we should be adding tons of code that essentially 
validates the silicon.  There are plenty of chances for the hardware to 
fuck up in a way that corrupts data, and is also difficult to detect. 
Pre-production BIOS have even done silly things like turn off data 
verification (checksum) by default.  Talk about subtle corruption...

So I feel the best path is to use the hardware programming sequences 
described in the spec, because that's what the chip designers and Q/A 
engineers validate with (read: the Windows driver).

Once we have deployed drivers with the standard programming sequences, 
_then_ we can consider looking into proper spurious interrupt 
accounting.  The current AHCI interrupt accounting stuff is not nearly 
as accurate as it should be, which implies that the code simply should 
not exist at the present time.

	Jeff

-
: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html