Tejun Heo wrote: > [CC'ing Monty and Douglas.] > > Hello, the original thread can be read from the following URL. > > http://thread.gmane.org/gmane.linux.ide/13708/focus=13708 > > Brice Goglin wrote: >> ens Axboe wrote: >>> On Mon, Oct 30 2006, Gregor Jasny wrote: >>> >>>> 2006/10/30, Jens Axboe <jens.axboe@xxxxxxxxxx>: >>>> >>>>> Can you confirm that 2.6.18 works? >>>>> >>>> The reporter of [1] states that his SATA Thinkpad freezes with 2.6.17 >>>> and 2.6.18, too. >>>> >>>> Gregor >>>> >>>> [1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=391901 >>>> >>> Ok, mainly just checking if this was a potential dupe of another bug. >>> >>> >> >> Jens (or anybody else who has any idea of how to debug this), >> >> Did you have a chance to reproduce the problem? I guess we "only" need a >> machine with SATA/ata_piix and cdparanoia 3.10. If you want me to debug >> some stuff, feel free to tell me what. But, since it freezes the machine >> and sysrq doesn't even work, I don't really know what to try... >> >> I just tried on rc5 and rc5-mm1, both have the problem (as 2.6.16, .17 >> and .18 do, don't know about earlier kernels). I didn't have a audio CD >> here, so I tried abcde on a DVD on purpose. With cdparanoia 3.10-pre0 >> (from Debian testing), it reports nothing during about 5 seconds and >> then the machine freezes. With cdparanoia 3a9.8-11 (from Debian stable), >> it reports an error very quickly, and dmesg gets a couple line like >> these: >> sg_write: data in/out 12/12 bytes for SCSI command 0x43--guessing >> data in; >> program cdparanoia not setting count and/or reply_len properly > > Okay, here's the story. > > In interface/scan_devices.c::cdda_identify_scsi(), cdparanoia calls > scsi_inquiry() to identify the device and determine interface type. This > seems to be the first time to actually issue commands to the device. As > interface type isn't completely determined, for sg devices, it first > issues the command w/ d->interface set to SGIO_SCSI. If that fails, it > falls back to SGIO_SCSI_BUGGY1. > > For to-device request, both SGIO_SCSI and SGIO_SCSI_BUGGY1 set > sg_io_hdr.dxfer_direction to SG_DXFER_TO_DEV. But for from-device > request, SGIO_SCSI uses SG_DXFER_TO_FROM_DEV while SGIO_SCSI_BUGGY1 uses > SG_DXFER_FROM_DEV. So, cdparanoia first issues inquiry w/ > SG_DXFER_TO_FROM_DEV and if that fails falls back to SG_DXFER_FROM_DEV. > > drivers/scsi/sg.c interprets SG_DXFER_TO_FROM_DEV as read while > block/scsi_ioctl.c interprets it as write. I guess this is historic > thing (scsi/sg.c updated but block/scsi_ioctl.c is forgotten). As > written above, cdparanoia can handle both cases as long as the kernel > promptly fails command issued with the wrong direction. > > This works for most PATA ATAPI devices. Most devices detect reversed > transfer and terminate the command promptly. But this doesn't seem to > be true for SATA device. Many just hang and time out commands with the > wrong transfer direction. If you consider that most early SATA ATAPI > devices are actually PATA + bridge, this is sorta inevitable. The > PATA-SATA bridge cannot issue D2H FIS to abort the command by itself. > It's just mirroring the status of PATA side and PATA side doesn't know > SATA protocol mismatch has occurred. > > So, IDENTIFY w/ write-DMA protocol times out after quite some seconds. > This is where things go worse from bad. SATA controllers which have > shadow TF registers don't handle timeout conditions very well, > especially when they're waiting for data transfer. They basically hold > the PCI bus and hang till the transfer completes (which never happens). > That's where the hard lock up comes from. > > Jens, I think we need to match block sg's behavior to SCSI's. Monty, > the timeout and hard lock up are due to hardware restrictions. Kernel > and libata can't do much about it. So, please find other way to detect > interface. Tejun, Your SG_DXFER_TO_FROM_DEV analysis is correct. The stupid ~!@# who wrote the code, and the documentation for it, defined SG_DXFER_TO_FROM_DEV to mean a "transfer from device" operation where the kernel buffer receiving the DMA transfer was prefilled with data that the application provided. That certainly isn't a bidirectional transfer to/from the device, but it is a bidirectional transfer to kernel buffers when indirect IO is used. Why do this? Because the 'resid' field indicating how much less data was transferred in a "from_device" transfer than was requested, was not added to SCSI infrastructure till much later. There are still LLDs out there that don't implement it. It also reflected a similar technique used with the sg_header structure (circa 1992) for precisely the same reason. And application writers wanted that functionality. Joerg was the first name of one such application writer. Coincidentally I am sitting on a patch from Luben Tuikov to cause the same breakage in the sg driver itself. Nobody has proposed a patch to the documentation for the explanation of SG_DXFER_TO_FROM_DEV :-) http://www.torque.net/sg/p/sg_v3_ho.html As I am currently proposing a SCSI pass through version 4 interface with twin scatter gather lists for independent bidirectional transfers for SCSI commands, I'm not sure what setting DMA_BIDIRECTIONAL in the existing interface buys us. When you maintain and document a pass through interface you sit between two groups of people that have conflicting goals and don't have a particularly high opinion of each other. Doug Gilbert - To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html