On 2018-04-29 09:02 AM, Christian König wrote: > Am 29.04.2018 um 01:02 schrieb Michel Dänzer: >> On 2018-04-28 06:30 PM, Ilia Mirkin wrote: >>> On Fri, Apr 27, 2018 at 9:08 AM, Michel Dänzer <michel at daenzer.net> >>> wrote: >>>> From: Michel Dänzer <michel.daenzer at amd.com> >>>> >>>> Previously, TTM would always (with CONFIG_TRANSPARENT_HUGEPAGE enabled) >>>> try to allocate huge pages. However, not all drivers can take advantage >>>> of huge pages, but they would incur the overhead for allocating and >>>> freeing them anyway. >>>> >>>> Now, drivers which can take advantage of huge pages need to set the new >>>> flag TTM_PAGE_FLAG_TRANSHUGE to get them. Drivers not setting this flag >>>> no longer incur any overhead for allocating or freeing huge pages. >>>> >>>> v2: >>>> * Also guard swapping of consecutive pages in ttm_get_pages >>>> * Reword commit log, hopefully clearer now >>>> >>>> Cc: stable at vger.kernel.org >>>> Signed-off-by: Michel Dänzer <michel.daenzer at amd.com> >>> Both I and lots of other people, based on reports, are still seeing >>> plenty of issues with this as late as 4.16.4. >> "lots of other people", "plenty of issues" sounds a bit exaggerated from >> what I've seen. FWIW, while I did see the original messages myself, I >> haven't seen any since Christian's original fix (see below), neither >> with amdgpu nor radeon, even before this patch you followed up to. >> >> >>> Admittedly I'm on nouveau, but others have reported issues with >>> radeon/amdgpu as well. It's been going on since the feature was merged >>> in v4.15, with what seems like little investigation from the authors >>> introducing the feature. >> That's not a fair assessment. See >> https://bugs.freedesktop.org/show_bug.cgi?id=104082#c40 and following >> comments. >> >> Christian fixed the original issue in >> d0bc0c2a31c95002d37c3cc511ffdcab851b3256 "swiotlb: suppress warning when >> __GFP_NOWARN is set". Christian did his best to try and get the fix in >> before 4.15 final, but for reasons beyond his control, it was delayed >> until 4.16-rc1 and then backported to 4.15.5. >> >> Unfortunately, there was an swiotlb regression (not directly related to >> Christian's work) shortly after this fix, also in 4.16-rc1, which is now >> fixed in 4.17-rc1 and will be backported to 4.16.y. > > And that's exactly the reason why I intentionally kept this enabled for > all users of the TTM DMA page pool and not put it behind a flag. > > This change has surfaced quite a number of bugs in the swiotlb code > which could have caused issues before. It's just that those code path > where never exercised massively before. > > Additional to that using huge pages is beneficial for the MM and CPU TLB > (not implemented yet) even when the GPU driver can't make much use of it. Do I understand correctly that you're against this patch? AFAIU the only benefit of huge pages with a driver which doesn't take advantage of them directly is "for the MM". Can you describe a bit more what that benefit is exactly? Is it expected to outweigh the cost of allocating / freeing huge pages? >> It looks like there's at least one more bug left, but it's not clear yet >> when that was introduced, whether it's directly related to Christian's >> work, or indeed what the impact is. Let's not get ahead of ourselves. > > Well my patches surfaced the problems, but the underlying issues where > present even before those changes and I'm very well involved in fixing > the underlying issues. > > I even considered to just revert the huge page path for the DMA pool > allocator, but it's just that the TTM patches seem to work exactly as > they are intended. So that doesn't feel like doing the right thing here. I agree. Has anyone reported this to the DMA/SWIOTLB developers? -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer