After thinking about this some more, the MAX3421E behavior could be triggered if a write to the SNDFIFO is not followed by a BULK_OUT command to the HXFR register. My driver always issues BULK_OUT after writing the SNDFIFO so this should never happen, but a corrupted SPI transfer could do this. Also, and perhaps more plausible for my driver, after an OUT transfer gets a response other than ACK (e.g., NAK or error), the MAX3421E doesn't unload that FIFO (assuming that you'll want to retransmit the data). My driver never retransmits the data immediately, so I think it has to issue a dummy write to the SNDBC register to switch back to the original FIFO. I know I tried that at one point and it didn't fix the issue, but I should try this again as it seems the most plausible explanation. --david On Thu, Mar 13, 2014 at 10:46 AM, David Mosberger <davidm@xxxxxxxxxx> wrote: > OK, I finally know where the problem is coming from! The MAX3421E > chip uses double-buffering. Specifically, it has two 64-byte send > FIFOs. You write up to 64 bytes to a send FIFO by repeatedly writing > to SPI register 2 (SNDFIFO). Then you tell the chip how many bytes > you just put in the FIFO by writing SPI register 7 (the > send-byte-count or SNDBC register). Writing SNDBC is supposed to > switch the FIFO to the USB-side so it can be transmitted on the USB > bus. > > Unfortunately, it seems that under certain circumstances, writing the > SNDBC fails to properly switch the FIFOs and we end up sending data to > the USB bus from the wrong FIFO. > > In the USB mass-storage error situation we're seeing, the driver was > trying to send a 31-byte "USBC" command and we see that command coming > over the SPI-bus just fine. However, on the USB-side, the MAX3421E > chip instead writes a 64-byte packet full of zeroes (which is the data > we were transmitting before). The mass-storage peripheral afterwards > NAKs any OUT request because it never saw the new SCSI WRITE_10 > command that was encapsulated in the "USBC" command. > > The work-around for now is to write outgoing packets twice, so that > both FIFOs contain the same data. With that workaround, we have been > able to dd 5MB blocks of data repeatedly without any issues (dd > if=/dev/zero of=/dev/sda1 count=5000 bs=1024). > > I should mention this is with rev 0x12 of the MAX3421E chip. The > current rev is 0x13 so we'll try with that chip in the next few days. > However, we are not aware of any erratas for rev 0x12 that would > explain this behavior. > > Also, for the record, we ran the SPI bus at only 4MHz for this testing > so we could reliably capture the data with the Saleae Logic. Giving > this low frequency and the fact that the Saleae was able to capture > the correct data, I do not think that SPI corruption is to blame. We > saw the same error occur even with SPI at 1MHz. > > I have the full trace data if anyone is interested. It's captures the > complete test (from loading the max3421 driver to when the error > occurs), so it's 55MiB in size, so I can't attach it to email. > > --david > > On Thu, Mar 13, 2014 at 8:55 AM, David Mosberger <davidm@xxxxxxxxxx> wrote: >> Yeah, sorry, the READ_10s were a total red herring. They're there >> because I forgot to specify bs=1024. ;-( >> >> I'll try to capture better traces today and if they look interesting, >> make them available. >> >> --david >> -- >> eGauge Systems LLC, http://egauge.net/, 1.877-EGAUGE1, fax 720.545.976 > > > > -- > eGauge Systems LLC, http://egauge.net/, 1.877-EGAUGE1, fax 720.545.9768 -- eGauge Systems LLC, http://egauge.net/, 1.877-EGAUGE1, fax 720.545.9768 -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html