Hi Finn, >> >> +#define ZORRO_ESP_GET_PRIV(esp) ((struct zorro_esp_priv *) \ >> >> + &zorro_esp_private_data[(esp->host->host_no)]) >> >> + >> > >> > How do you know that host_no won't exceed the array bounds? >> > Why not use dev_{set,get}_drvdata(esp->dev)? -- much as mac_esp uses >> > platform_{set,get}_drvdata() here. >> >> I don't think you can have more than 8 Zorro cards in an Amiga. > > The host_no range is not limited to just zorro cards. It's also libata > hosts, USB Attached Storage hosts, scsi_debug etc. Max. libata hosts is one, no USB host adapters I know of, leaves scsi_debug. But I'll get rid of this if at all possible. >> > I suspect that the mac_esp PIO code should work fine here unchanged. >> > If so, let's avoid a new variation on that code if that's possible (?) >> >> The Mac PIO code did not work unchanged here, that's why I checked the >> interrupt bit raised and changed the mask. But that was only for one >> quirky target (CD-ROM, SCSI-1 CCS IIRC) and only when testing the entire >> driver without DMA. Doesn't happen in DMA mode. >> >> You would have seen the error only with such a special target, and >> probably not in PDMA mode... >> > > The TCQ issue showed up on the AV Quadras becasue mac_esp doesn't know how > to do PDMA or DMA on that hardware, and so it always uses PIO. It sounds > like this bug would show up too given the right kind of target. If so, I > will need to patch mac_esp too. It did show up in PIO mode so it's a matter of having the right target to trigger it. dmesg on elgar says: [ 26.960000] scsi host1: esp [ 27.270000] scsi 1:0:1:0: Direct-Access IBMRAID DFHSS2F9337 4I4I PQ: 0 ANSI: 2 [ 27.280000] scsi target1:0:1: Beginning Domain Validation [ 27.360000] scsi target1:0:1: FAST-10 SCSI 10.0 MB/s ST (100 ns, offset 15) [ 27.380000] scsi target1:0:1: Domain Validation skipping write tests [ 27.390000] scsi target1:0:1: Ending Domain Validation [ 27.450000] scsi 1:0:2:0: CD-ROM SONY CD-ROM CDU-561 1.9m PQ: 0 ANSI: 2 CCS [ 27.460000] scsi target1:0:2: Beginning Domain Validation [ 27.580000] scsi target1:0:2: FAST-5 SCSI 4.0 MB/s ST (248 ns, offset 15) [ 27.600000] scsi target1:0:2: Domain Validation skipping write tests [ 27.620000] scsi target1:0:2: Ending Domain Validation [ 27.750000] scsi host1: scsi scan: INQUIRY result too short (5), using 36 To the best of my knowledge, the CD-ROM triggered that error. Looking at the NCR53CF94_96 databook, select with attention and stop is used when multiple message bytes need to be transferred (example: sync negotiation - that's when esp->flags |= ESP_FLAG_DOING_SLOWCMD is used in the ESP core which switches to using this selection mode). One message byte is transferred before the ESP raises both ESP_INTR_FDONE and ESP_INTR_BSERV. From all appearances, the bus will still be in message out phase at that stage. u8 mask = ~(phase == ESP_MIP ? ESP_INTR_FDONE : (phase == ESP_MOP ? ESP_INTR_FDONE|ESP_INTR_BSERV : ESP_INTR_BSERV) ); would set the correct interrupt status mask to avoid detecting this condition as error. Doing sync negotiation is fairly basic - I wonder why you haven't seen this before. Ah - the Mac driver supports sync transfers only when using PDMA, that's why. The description of 'select with attention' also states the same two interrupt bits are set, so maybe the bug is elsewhere. According to the manual, only the first message byte should be loaded into the FIFO when stopping for additional message bytes. We load the entire message instead... > >> Regarding your other comment on this code: I'll have to review what >> phase was present when the select with attention and stop condition >> occurred, > > Thanks. That's what I'd prefer to use in mac_esp (if it works). > >> but setting the mask up front according to the bus phase would make a >> lot of sense. >> > > The idea is that ESP_INTR_FDONE is an error when the phase is not > appropriate. In the version you sent, the loop permits ESP_INTR_BSERV | > ESP_INTR_FDONE regardless of phase. Deciding whether an error exists just based on the phase might be impossible now I come to think of it - the combination of ESP_INTR_FDONE|ESP_INTR_BSERV is only legitimate for selection commands. We might need to look at additional information to figure out whether getting ESP_INTR_FDONE in message out phase was an error. This is getting a little academic for zorro_esp - if a particular board needs to do PIO all the time, we need a way to set the ESP_FLAG_USE_FIFO flag in the core driver instead. Much safer. > >> > WARN_ON is preferred. Please see Johannes Thumshirn's message from last >> > week, >> > https://www.spinics.net/lists/linux-scsi/msg117919.html >> >> Well, should we ever see this condition trigger (I have never!), I don't >> think there is a way we can recover from it. We expect a DMA command >> (DMA i.e. physical address), how can we reliably determine whether the >> calling code just forgot to set the bit, or really passed us a kernel >> virtual address? >> > > Thinking about this a bit more, BUG_ON may be the right thing to do here. > If the core driver will not catch the DMA error, that might lead to data > corruption. You'd better check this before switching to WARN_ON. We can't > assume that end users will notice the warning in the logs. Reading that thread, leaving BUG_ON would make Linus pretty angry. Setting zep->error and letting the core driver deal with it might be safer. > >> >> +static int zorro_esp_dma_error(struct esp *esp) >> >> +{ >> >> + struct zorro_esp_priv *zep = ZORRO_ESP_GET_PRIV(esp); >> >> + u8 phase = esp->sreg & ESP_STAT_PMASK; >> >> + >> >> + /* if doing PIO, check for error */ >> >> + if (phase == ESP_MIP && zep->error == 1) >> >> + return 1; >> > >> > Don't check the bus phase here, the target controls that. Just make sure >> > zep->error gets set to zero when not using PIO; that's what >> > mac_esp_send_pdma_cmd() does. >> >> You mean before sending each DMA command? Couldn't find a way to figure >> out whether we're been doing DMA or PIO otherwise. >> > > Both mac_esp_send_pdma_cmd() and mac_esp_send_pio_cmd() just initialize > this to zero at the outset. Any time that mep->error is set, it means the > most recent transfer failed, which I think is what matters here. I'm clearing zep->error now when entering the DMA set-up, and set it when PIO failed (or the DMA flag was not set). Thanks again, Michael > > --