Re: Early-boot kernel panics from udev-165/extras/ata_id/ata_id.c

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Some additional info. If I modify test-identify-packet.c to align on _SC_PAGESIZE (4096 byte) rather than 512, as shown below, then run the loop:

offset=${1:-0}
increm=${2:-1}

while [ $((offset+=$increm)) -lt  $((4096-511)) ]; do
   echo -e "+++ ./test-identify-packetpage /dev/sr0 $offset\n"
   ./test-identify-packet-page /dev/sr0 $offset
   sleep 0.5
done

no panics occur, for every offset.

512 -> pagesize modification:

--- test-identify-packet.c      2011-01-17 13:47:25.000000000 -0500
+++ test-identify-packet-page.c 2011-01-17 22:27:11.293999984 -0500
@@ -99,7 +99,8 @@

 int main(int argc, char *argv[])
 {
-       char buf[2048];
+        int pgsz = sysconf(_SC_PAGESIZE);
+       char buf[pgsz<<1];
        char *id;
        char *path;
        int offset = 0;
@@ -116,7 +117,7 @@
        if (argc > 2)
                offset = atoi(argv[2]);

-       if (offset < 0 || offset > 512) {
+       if (offset < 0 || offset > pgsz-512) {
                fprintf(stderr, "offset out of range\n");
                return 1;
        }
@@ -133,7 +134,7 @@
                return 1;
        }

-       id = (void *)((((unsigned long)buf + 511) & ~511) + offset);
+       id = (void *)((((unsigned long)buf + pgsz-1) & ~(pgsz-1)) + offset);
        printf("id buffer=%p\n", id);

        disk_identify_packet_device_command(fd, id, 512);

John


On 01/17/2011 10:27 AM, Tejun Heo wrote:
Hello,

On Sun, Jan 16, 2011 at 11:03:06PM -0500, John Stanley wrote:
The kernel-panic, which occurs at boot-time in udev/ata_id.c when
issuing an ioctl SG_IO sg3 SCSI ATA Pass-through Identify command,
appears to arise from DMA'ing into an incorrectly aligned user data
buffer pointed to by sg_io_hdr.dxferp .
The problem is that nobody is DMA'ing in this case.  The driver in
question is ata_piix and the IO path taken is an actual PIO where the
CPU reads from the IO space and writes to the memory itself.

My guess is that in the past, use of sg3 would not involve DMA by
default, but now, with libata ATA Pass-Through commands, it does (I
also may be totally wrong about that, just a thought).
No DMA in progress here.  The only (somewhat) recent related change
would be libata PIO path now using 32bit IO commands when supported by
the controller, but I fail to see how that would trigger this type of
failures.

I recall documentaion somewhere which emphasized that if direct I/O
(DMA) is to used in sg, one should page-align the SCSI response data
buffer..  With sg using indirect I/O this wouldn't be necessary, of
course, but perhaps now with libata, it is. Just guessing here.
If the buffer is not aligned, the kernel would just create a bounce
buffer and bounce the data, so it shouldn't be a problem either.  It
looks like we have an obscure bug in buffer mapping code for SG_IO.

I tried several things but can't reproduce the problem here.  Can you
please try the attached minimal test case?  It issues IDENTIFY_PACKET
and you can specify the alignment offset.  By default the buffer would
be 512byte aligned but you can offset it.  ie. specifying 1 would make
the buffer misaligned by 1 byte and so on.

Can you please see whether the problem can be reliably triggered with
it?  Also, please,

* Attach full kernel log (including boot messages) and the program
   output after triggering the problem.

* Make sure the kernel is built with debug info and frame pointer.

* Please reverse map the reported oops address to the source line.

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-hotplug" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel]     [Linux DVB]     [Asterisk Internet PBX]     [DCCP]     [Netdev]     [X.org]     [Util Linux NG]     [Fedora Women]     [ALSA Devel]     [Linux USB]

  Powered by Linux