On Mon, Aug 24, 2020 at 2:56 AM Tom Murphy <murphyt7@xxxxxx> wrote: > > Hi Logan/All, > > I have added a check for the sg_dma_len == 0 : > """ > } __sgt_iter(struct scatterlist *sgl, bool dma) { > struct sgt_iter s = { .sgp = sgl }; > > + if (sgl && sg_dma_len(sgl) == 0) > + s.sgp = NULL; > > if (s.sgp) { > ..... > """ > at location [1]. > but it doens't fix the problem. > > You're right though, this change does need to be made, this code > doesn't handle pages of sg_dma_len(sg) == 0 correctly > So my guess is that we have more bugs in other parts of the i915 > driver (or there is a problem with my "sg_dma_len == 0" fix above). > I have been trying to spot where else the code might be buggy but I > haven't had any luck so far. > > I'm doing a microconfernce (at LPC 2020) this wednesdays [1] on this > if you're interested in attending. > I'm hoping I can chat about it with a few people and find how can > reproduce and fix this issues. I don't have any more time I can give > to this unfortunately and it would be a shame for the work to go to > waste. > > [0] https://github.com/torvalds/linux/blob/d012a7190fc1fd72ed48911e77ca97ba4521bccd/drivers/gpu/drm/i915/i915_scatterlist.h#L28 > [1] https://linuxplumbersconf.org/event/7/contributions/846/ > > On Fri, 29 May 2020 at 22:21, Logan Gunthorpe <logang@xxxxxxxxxxxx> wrote: > > > > > > > > On 2020-05-29 3:11 p.m., Marek Szyprowski wrote: > > > Patches are pending: > > > https://lore.kernel.org/linux-iommu/20200513132114.6046-1-m.szyprowski@xxxxxxxxxxx/T/ > > > > Cool, nice! Though, I still don't think that fixes the issue in > > i915_scatterlist.h given it still ignores sg_dma_len() and strictly > > relies on sg_next()/sg_is_last() to stop iterating -- and I suspect this > > is the bug that got in Tom's way. > > > > >> However, as Robin pointed out, there are other ugly tricks like stopping > > >> iterating through the SGL when sg_dma_len() is zero. For example, the > > >> AMD driver appears to use drm_prime_sg_to_page_addr_arrays() which does > > >> this trick and thus likely isn't buggy (otherwise, I'd expect someone to > > >> have complained by now seeing AMD has already switched to IOMMU-DMA. We ran into the same issue with amdgpu and radeon when the AMD IOMMU driver was converted and had to fix it as well. The relevant fixes were: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=42e67b479eab6d26459b80b4867298232b0435e7 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0199172f933342d8b1011aae2054a695c25726f4 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=47f7826c520ecd92ffbffe59ecaa2fe61e42ec70 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c0f83d164fb8f3a2b7bc379a6c1e27d1123a9eab Alex > > > > > > I'm not sure that this is a trick. Stopping at zero sg_dma_len() was > > > somewhere documented. > > > > Well whatever you want to call it, it is ugly to have some drivers doing > > one thing with the returned value and others assuming there's an extra > > zero at the end. It just causes confusion for people reading/copying the > > code. It would be better if they are all consistent. However, I concede > > stopping at zero should not be broken, presently. > > > > Logan > _______________________________________________ > dri-devel mailing list > dri-devel@xxxxxxxxxxxxxxxxxxxxx > https://lists.freedesktop.org/mailman/listinfo/dri-devel