Re: [bisected] Re: todays git: WARNING: at drivers/ata/libata-sff.c:1017 ata_sff_hsm_move+0x45e/0x750()

Sergei Shtylyov <sshtylyov@xxxxxxxxxxxxx> · Sat, 10 Jan 2009 18:59:06 +0300

Alan Cox wrote:

   All the S/G counts printed out were divisible by 4 (36 for INQUIRY and 96 
for REQUSET SENSE). It's the *actual* byte count for the REQUEST SENSE that's 
no divisible. The SCSI/ATAPI devices are free to sent less data than requested 
on non block transfer commands.

That is just fine - if the sg list is not corrupt or being mishandled and
the atapi pio code is not buggy.

RTFS a bit and it becomes obvious that the core libata code has a bug:

   Oh, I have already... and saw where the issue could be. It just wasn't 
obvious why 32-bit PIO triggered it.

From libata-sff.c:

        /* consumed can be larger than count only for the last transfer */
        WARN_ON_ONCE(qc->cursg && count != consumed);

The big clue turns out to be that the code doesn't match the comment.

Next note the check on qc->cursg. If my input sg list is a 36 byte single
sg entry then qc->cursg should be NULL by the WARN_ON() - but it isn't.

   I think it's still not NULL because qc->cursg_ofs == sg->length check was 
*not* true right above, hence sg_next() wasn't called...

If qc->cursg is NULL when the sg_next() is run then we don't warn because
we are quite happy with the last segment being padded or underrunning.

   I don't think that sg_next() is called on an underrun segment. And here 
lies the mistake.

What we actually want to explode on is a case where we transfer more
bytes than are wanted and where there are more sg entries to perform - at
that point we would corrupt.

So at least one failure case is

	Core code issues an SG list for 96 bytes
	Drive indicates it wishes to return 18 bytes

	data_xfer transfers 18 bytes + 2 padding (correctly) -> 20 bytes

At this point __atapi_pio_bytes breaks

	it updates qc->curbytes by 18
	it updates the offset by 18

	The last segment is not exhausted so it does not update qc->cursg

	qc->cursg is not updated and the WARN erroneously uses !=

The bogus WARN_ON_ONCE() triggers.

   Yes.

So the bug is the WARN_ON being wrong. In fact __atapi_pio_bytes doesn't
know enough to do the WARN check correctly as it doesn't know if it is
the last request being made. It just happens it didn't break before
because all our transfers are word aligned.

   Er... I'm not sure what's changed with 32-bit PIO patch.

Alan

MBR, Sergei
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html