Hi Logan/All, I have added a check for the sg_dma_len == 0 : """ } __sgt_iter(struct scatterlist *sgl, bool dma) { struct sgt_iter s = { .sgp = sgl }; + if (sgl && sg_dma_len(sgl) == 0) + s.sgp = NULL; if (s.sgp) { ..... """ at location [1]. but it doens't fix the problem. You're right though, this change does need to be made, this code doesn't handle pages of sg_dma_len(sg) == 0 correctly So my guess is that we have more bugs in other parts of the i915 driver (or there is a problem with my "sg_dma_len == 0" fix above). I have been trying to spot where else the code might be buggy but I haven't had any luck so far. I'm doing a microconfernce (at LPC 2020) this wednesdays [1] on this if you're interested in attending. I'm hoping I can chat about it with a few people and find how can reproduce and fix this issues. I don't have any more time I can give to this unfortunately and it would be a shame for the work to go to waste. [0] https://github.com/torvalds/linux/blob/d012a7190fc1fd72ed48911e77ca97ba4521bccd/drivers/gpu/drm/i915/i915_scatterlist.h#L28 [1] https://linuxplumbersconf.org/event/7/contributions/846/ On Fri, 29 May 2020 at 22:21, Logan Gunthorpe <logang@xxxxxxxxxxxx> wrote: > > > > On 2020-05-29 3:11 p.m., Marek Szyprowski wrote: > > Patches are pending: > > https://lore.kernel.org/linux-iommu/20200513132114.6046-1-m.szyprowski@xxxxxxxxxxx/T/ > > Cool, nice! Though, I still don't think that fixes the issue in > i915_scatterlist.h given it still ignores sg_dma_len() and strictly > relies on sg_next()/sg_is_last() to stop iterating -- and I suspect this > is the bug that got in Tom's way. > > >> However, as Robin pointed out, there are other ugly tricks like stopping > >> iterating through the SGL when sg_dma_len() is zero. For example, the > >> AMD driver appears to use drm_prime_sg_to_page_addr_arrays() which does > >> this trick and thus likely isn't buggy (otherwise, I'd expect someone to > >> have complained by now seeing AMD has already switched to IOMMU-DMA. > > > > I'm not sure that this is a trick. Stopping at zero sg_dma_len() was > > somewhere documented. > > Well whatever you want to call it, it is ugly to have some drivers doing > one thing with the returned value and others assuming there's an extra > zero at the end. It just causes confusion for people reading/copying the > code. It would be better if they are all consistent. However, I concede > stopping at zero should not be broken, presently. > > Logan