On Mon, 5 Mar 2018, Michael Schmitz wrote: > > The TCQ issue showed up on the AV Quadras becasue mac_esp doesn't know > > how to do PDMA or DMA on that hardware, and so it always uses PIO. It > > sounds like this bug would show up too given the right kind of target. > > If so, I will need to patch mac_esp too. > > It did show up in PIO mode so it's a matter of having the right target > to trigger it. dmesg on elgar says: > > [ 26.960000] scsi host1: esp > [ 27.270000] scsi 1:0:1:0: Direct-Access IBMRAID DFHSS2F9337 4I4I PQ: 0 ANSI: 2 > [ 27.280000] scsi target1:0:1: Beginning Domain Validation > [ 27.360000] scsi target1:0:1: FAST-10 SCSI 10.0 MB/s ST (100 ns, offset 15) > [ 27.380000] scsi target1:0:1: Domain Validation skipping write tests > [ 27.390000] scsi target1:0:1: Ending Domain Validation > [ 27.450000] scsi 1:0:2:0: CD-ROM SONY CD-ROM CDU-561 1.9m PQ: 0 ANSI: 2 CCS > [ 27.460000] scsi target1:0:2: Beginning Domain Validation > [ 27.580000] scsi target1:0:2: FAST-5 SCSI 4.0 MB/s ST (248 ns, offset 15) > [ 27.600000] scsi target1:0:2: Domain Validation skipping write tests > [ 27.620000] scsi target1:0:2: Ending Domain Validation > [ 27.750000] scsi host1: scsi scan: INQUIRY result too short (5), using 36 > > To the best of my knowledge, the CD-ROM triggered that error. Looking at > the NCR53CF94_96 databook, select with attention and stop is used when > multiple message bytes need to be transferred (example: sync negotiation > - that's when esp->flags |= ESP_FLAG_DOING_SLOWCMD is used in the ESP > core which switches to using this selection mode). > > One message byte is transferred before the ESP raises both > ESP_INTR_FDONE and ESP_INTR_BSERV. From all appearances, the bus will > still be in message out phase at that stage. > > u8 mask = ~(phase == ESP_MIP ? ESP_INTR_FDONE : (phase == ESP_MOP ? > ESP_INTR_FDONE|ESP_INTR_BSERV : ESP_INTR_BSERV) ); > > would set the correct interrupt status mask to avoid detecting this > condition as error. Well, it is consistent with zorro_esp_send_pio_cmd as far as ESP_MOP is concerned. > Doing sync negotiation is fairly basic - I wonder why you haven't seen > this before. Ah - the Mac driver supports sync transfers only when using > PDMA, that's why. > I suspect that if you try PIO for all transfers, not just Message In, you'll need to drop async support too. But I don't recall why async was a problem back when I was developing mac_esp. It was too long ago. > The description of 'select with attention' also states the same two > interrupt bits are set, so maybe the bug is elsewhere. According to the > manual, only the first message byte should be loaded into the FIFO when > stopping for additional message bytes. We load the entire message > instead... > FDONE is not a problem once all the bytes are in the fifo. The mac_esp algorithm already accomodates that. Which manual is this, BTW? > > > >> Regarding your other comment on this code: I'll have to review what > >> phase was present when the select with attention and stop condition > >> occurred, > > > > Thanks. That's what I'd prefer to use in mac_esp (if it works). > > > >> but setting the mask up front according to the bus phase would make a > >> lot of sense. > >> > > > > The idea is that ESP_INTR_FDONE is an error when the phase is not > > appropriate. In the version you sent, the loop permits ESP_INTR_BSERV > > | ESP_INTR_FDONE regardless of phase. > > Deciding whether an error exists just based on the phase might be > impossible now I come to think of it - the combination of > ESP_INTR_FDONE|ESP_INTR_BSERV is only legitimate for selection commands. Well, I'm not trying to decide whether any error exists, I'm just trying to detect an unexpected interrupt in the middle of ESP_DOP. Hence my suggestion to add a test for ESP_MOP. > We might need to look at additional information to figure out whether > getting ESP_INTR_FDONE in message out phase was an error. > Your algorithm is pretty clear on that point: FDONE is not an error during message out... but I'm beginning to think that it really is an error during message out, except at the end of the transfer. > This is getting a little academic for zorro_esp - if a particular board > needs to do PIO all the time, we need a way to set the ESP_FLAG_USE_FIFO > flag in the core driver instead. Much safer. > Unfortunately, that flag doesn't always do what is needed for PIO. Anyway, this discussion is completely academic when you consider this: if (phase == ESP_MIP) { zorro_esp_send_pio_cmd(esp, addr, esp_count, dma_count, write, cmd); return; } Based on this, you should be able to use the algorithm from mac_esp_send_pio_cmd() unaltered, since your change to the !write branch is irrelevant during Message In phase. My main concern here is to avoid two slightly different copies of that code. > > > > Thinking about this a bit more, BUG_ON may be the right thing to do > > here. If the core driver will not catch the DMA error, that might lead > > to data corruption. You'd better check this before switching to > > WARN_ON. We can't assume that end users will notice the warning in the > > logs. > > Reading that thread, leaving BUG_ON would make Linus pretty angry. Maybe remove it if you think it can't fire. > Setting zep->error and letting the core driver deal with it might be > safer. > True enough ;-) --