On Sat, 18 Feb 2017, Ondrej Zary wrote: > On Friday 17 February 2017 23:38:12 Finn Thain wrote: > > On Thu, 16 Feb 2017, Ondrej Zary wrote: > > > On Tuesday 31 January 2017 02:31:45 Finn Thain wrote: > > > [...] > > > > > > > Are you trying to figure out which commands are going to > > > > disconnect during a transfer? This is really a function of the > > > > firmware in the target; there are no good heuristics AFAICT, so > > > > the PDMA algorithm has to be robust. mac_scsi has to cope with > > > > this too. > > > > > > > > Does the problem go away when you assign no IRQ? When > > > > instance->irq == NO_IRQ, the core driver will inhibit disconnect > > > > privileges. > > > > > > Yes, it seems to run fine with IRQ/disconnect disabled. With IRQ > > > enabled, "dd if=/dev/sr0 of=anything" stops after a while. > > > > When you say "stops", do you mean an infinite loop in > > generic_NCR5380_pread or does the loop complete (which would > > presumably leave the target stuck in DATA IN phase, and a scsi command > > timeout would probably follow after 30 seconds...) > > I've added timeouts to the possibly-infinite loops. It times out waiting > for the host buffer to become ready. In mac_scsi you'll find that the PDMA loop exploits the 15ms poll_politely time limit to give the target device time to catch up. You might want to do something similar. > Then everything breaks. Now I found why: pread() returns without waiting > for the 53C80 registers to be ready. Ouch! You can't return control to the core driver when the 53C80 core is unavailable. That would need special handling: the core driver would have to fail the scsi command and reset the device (and bus), based on the result you return from NCR5380_dma_recv_setup/NCR5380_dma_send_setup. > Adding the wait allows to continue in PIO mode with "tag#0 switching to > slow handshake". > I don't think this is the code path you want. The target isn't really broken. But yes, we could use PIO as a slow workaround for fragile PDMA logic. > > > I get gated 53C80 IRQ, BASR=0x10, MODE=0x0e, STATUS=0x7c. > > > > You can use NCR5380_print() to get a decoded register dump. > > > > When I decode the above values I get, > > > > BASR = 0x10 = BASR_IRQ > > MODE = 0x0e = MR_ENABLE_EOP_INTR | MR_MONITOR_BSY | MR_DMA_MODE > > STATUS = 0x7c = SR_BSY | SR_REQ | SR_MSG | SR_CD | SR_IO > > > > Since BASR_PHASE_MATCH is not set, the interrupt is almost certainly a > > phase mismatch. The new phase is SR_MSG | SR_CD | SR_IO = PHASE_MSGIN, > > which shows that the target has given up on the DATA IN phase and is > > presumably trying to send a DISCONNECT message. > > Looks you're right. The transfersize is 4096, i.e. 2 CD-ROM sectors. > When the 2nd sector is not ready in the drive internal buffer, the drive > probably disconnects which breaks the fragile pdma transfer. When the > transfersize is limited to 2048 bytes, the problem goes away. > OK. > The problem also went away with enabled interrupts because I had some > debug printks added which were slowing down the transfer enough for the > drive (SONY CDU-55S) to keep up and not disconnect. Got the same result > by inserting udelay(100) after each 128 byte block trasnferred in > pread(). > Well, yeah, if you change the timing and omit to mention that, as well as changing the spinlock_irq_save/restore pairs, then it's going to be difficult for me to make sense of your results. Anyway, I'm glad that you got to the bottom of this mystery. > > > When I enable interrupts during PDMA (like the removed UNSAFE macro > > > did), the problems go away. I see an IRQ after each pread call. > > --