Hi, John Stultz <john.stultz@xxxxxxxxxx> writes: > On Thu, Jan 23, 2020 at 9:46 AM Felipe Balbi <balbi@xxxxxxxxxx> wrote: >> >>>>> >> >>>>> Since ~4.20, when the functionfs gadget enabled scatter-gather >> >>>>> support, we have seen problems with adb connections stalling and >> >>>>> stopping to function on hardware with dwc3 usb controllers. >> >>>>> Specifically, HiKey960, Dragonboard 845c, and Pixel3 devices. > ... >> >> >> >> I'm pretty sure this should be solved at the DMA API level, just want to confirm. >> > >> > I have sent you the tracepoints long time ago. Also my analysis of the >> > problem (BTW, I don't think the tracepoints helped much). It's >> > basically a logic problem in function dwc3_gadget_ep_reclaim_trb_sg(). >> >> AFAICT, this is caused by DMA API merging pages together when map an >> sglist for DMA. While doing that, it does *not* move the SG_END flag >> which sg_is_last() checks. >> >> I consider that an overlook on the DMA API, wouldn't you? Why should DMA >> API users care if pages were merged or not while mapping the sglist? We >> have for_each_sg() and sg_is_last() for a reason. >> > > From an initial look, I agree this is pretty confusing. dma_map_sg() > can coalesce entries in the sg list, modifying the sg entires > themselves, however, in doing so it doesn't modify the number of > entries in the sglist (nor the end state bit). That's pretty subtle! > > My initial naive attempt to fix the dma-iommu path to set the end bit > at the tail of __finalize_sg() which does the sg entry modifications, > only caused trouble elsewhere, as there's plenty of logic that expects > the number of entries to not change, so having sg_next() return NULL > before that point results in lots of null pointer traversals. > > Further, looking at the history, while apparently quirky, this has > been the documented behavior in DMA-API.txt for over almost 14 years > (at least). It clearly states that that dma_map_api can return fewer > mapped entries then sg entries, and one should loop only that many > times (for_each_sg() having a max number of entries to iterate over > seems specifically for this purpose). Additionally, it says one must > preserve the original number of entries (not # mapped entries) for > dma_unmap_sg(). > > So I'm not sure that sg_is_last() is really valid for iterating on > mapped sg lists. > > Should it be? Probably (at least with my unfamiliar eyes), but > sg_is_last() has been around for almost as long coexisting with this > behavioral quirk, so I'm also not sure this is the best hill for the > dwc3 driver to die on. :) > > The fix here: > https://lore.kernel.org/lkml/20200122222645.38805-3-john.stultz@xxxxxxxxxx/ > Or maybe the slightly cleaner varient here: > https://git.linaro.org/people/john.stultz/android-dev.git/commit/?h=dev/db845c-mainline-WIP&id=61a5816aa71ec719e77df9f2260dbbf6bcec7c99 in that case, we don't need to use sg_is_last() at all, since i will always encode the last entry in the list. > seems like it would correctly address things following the > documentation and avoid the failures we're seeing. > > As to if dma_map_sg() should fixup the state bits or properly shrink > the sg list when it coalesces entries, that seems like it would be a > much more intrusive change across quite a bit of the kernel that was > written to follow the documented method. So my confidence that such a > change would make it upstream in a reasonable amount of time isn't > very high, and it seems like a bad idea to block the driver from > working properly in the meantime. > > Pulling in Christoph and Jens as I suspect they have more context on > this, and maybe can explain thats its not so quirky with the right > perspective? > > Thoughts? Maybe there is an easier way to make it less quirky then > what I imagine? it just seems very counter-intuitive to me that DMA api can coalesce entries but they're actually still there and drivers have to cope with this behavior. -- balbi
Attachment:
signature.asc
Description: PGP signature