On Thu, 16 Feb 2017, Ondrej Zary wrote: > On Tuesday 31 January 2017 02:31:45 Finn Thain wrote: > [...] > > Are you trying to figure out which commands are going to disconnect > > during a transfer? This is really a function of the firmware in the > > target; there are no good heuristics AFAICT, so the PDMA algorithm has > > to be robust. mac_scsi has to cope with this too. > > > > Does the problem go away when you assign no IRQ? When instance->irq == > > NO_IRQ, the core driver will inhibit disconnect privileges. > > Yes, it seems to run fine with IRQ/disconnect disabled. With IRQ > enabled, "dd if=/dev/sr0 of=anything" stops after a while. When you say "stops", do you mean an infinite loop in generic_NCR5380_pread or does the loop complete (which would presumably leave the target stuck in DATA IN phase, and a scsi command timeout would probably follow after 30 seconds...) > I get gated 53C80 IRQ, BASR=0x10, MODE=0x0e, STATUS=0x7c. > You can use NCR5380_print() to get a decoded register dump. When I decode the above values I get, BASR = 0x10 = BASR_IRQ MODE = 0x0e = MR_ENABLE_EOP_INTR | MR_MONITOR_BSY | MR_DMA_MODE STATUS = 0x7c = SR_BSY | SR_REQ | SR_MSG | SR_CD | SR_IO Since BASR_PHASE_MATCH is not set, the interrupt is almost certainly a phase mismatch. The new phase is SR_MSG | SR_CD | SR_IO = PHASE_MSGIN, which shows that the target has given up on the DATA IN phase and is presumably trying to send a DISCONNECT message. > When I enable interrupts during PDMA (like the removed UNSAFE macro > did), the problems go away. I see an IRQ after each pread call. You shouldn't need an interrupt, because NCR5380_dma_complete() gets called regardless. AFAICT, the only difference is the timing, which becomes less predictable. Calling spinlock_irq_restore() before the transfer seems to create a race condition between hostdata->dma_len store and load. I think that the pread call has not yet returned when the interrupt fires and NCR5380_dma_complete() is called. Hence hostdata->dma_len has not yet been initialized. So when NCR5380_dma_complete() is called by the interrupt handler, SCp.this_residual will not change at all because hostdata->dma_len is still zero. > (had to disable "no 53C80 gated irq after transfer" and "no end dma > signal" messages to reduce log spam) > That may provide a quick way to detect the failed PDMA transfer, at least for testing purposes. There may be a more conclusive test for a partial transfer. We could to implement something like macscsi_dma_residual() to take care of a failed PDMA transfer. That way, the failure can be taken into account when NCR5380_dma_complete() is called at the end of the transfer. See patch below for example. It should confirm the theory above though it really needs some timeouts added (NCR5380_poll_politely()). And perhaps we could do something more clever than retry indefinitely, though we can still rely on the command timeout. diff --git a/drivers/scsi/g_NCR5380.c b/drivers/scsi/g_NCR5380.c index 6f9665d..75cfaf3 100644 --- a/drivers/scsi/g_NCR5380.c +++ b/drivers/scsi/g_NCR5380.c @@ -497,15 +497,17 @@ static inline int generic_NCR5380_pread(struct NCR5380_hostdata *hostdata, blocks--; } + hostdata->pdma_residual = 0; + if (!(NCR5380_read(hostdata->c400_ctl_status) & CSR_GATED_53C80_IRQ)) - printk("53C400r: no 53C80 gated irq after transfer"); + hostdata->pdma_residual = hostdata->dma_len; /* wait for 53C80 registers to be available */ while (!(NCR5380_read(hostdata->c400_ctl_status) & CSR_53C80_REG)) ; if (!(NCR5380_read(BUS_AND_STATUS_REG) & BASR_END_DMA_TRANSFER)) - printk(KERN_ERR "53C400r: no end dma signal\n"); + hostdata->pdma_residual = hostdata->dma_len; return 0; } @@ -576,12 +578,14 @@ static inline int generic_NCR5380_pwrite(struct NCR5380_hostdata *hostdata, /* FIXME - no timeout */ } - if (!(NCR5380_read(BUS_AND_STATUS_REG) & BASR_END_DMA_TRANSFER)) { - printk(KERN_ERR "53C400w: no end dma signal\n"); - } + hostdata->pdma_residual = 0; + + if (!(NCR5380_read(BUS_AND_STATUS_REG) & BASR_END_DMA_TRANSFER)) + hostdata->pdma_residual = hostdata->dma_len; while (!(NCR5380_read(TARGET_COMMAND_REG) & TCR_LAST_BYTE_SENT)) ; // TIMEOUT + return 0; } @@ -607,6 +611,11 @@ static int generic_NCR5380_dma_xfer_len(struct NCR5380_hostdata *hostdata, return transfersize; } +static int generic_NCR5380_dma_residual(struct NCR5380_hostdata *hostdata) +{ + return hostdata->pdma_residual; +} + /* * Include the NCR5380 core code that we build our driver around */ diff --git a/drivers/scsi/g_NCR5380.h b/drivers/scsi/g_NCR5380.h index 81b22d9..08c91b7 100644 --- a/drivers/scsi/g_NCR5380.h +++ b/drivers/scsi/g_NCR5380.h @@ -26,7 +26,8 @@ int c400_ctl_status; \ int c400_blk_cnt; \ int c400_host_buf; \ - int io_width; + int io_width; \ + int pdma_residual; #define NCR53C400_mem_base 0x3880 #define NCR53C400_host_buffer 0x3900 @@ -35,7 +36,7 @@ #define NCR5380_dma_xfer_len generic_NCR5380_dma_xfer_len #define NCR5380_dma_recv_setup generic_NCR5380_pread #define NCR5380_dma_send_setup generic_NCR5380_pwrite -#define NCR5380_dma_residual NCR5380_dma_residual_none +#define NCR5380_dma_residual generic_NCR5380_dma_residual #define NCR5380_intr generic_NCR5380_intr #define NCR5380_queue_command generic_NCR5380_queue_command --