On 08/10/2014 08:02 PM, Mario Kleiner wrote: > On 08/10/2014 01:03 PM, Thomas Hellstrom wrote: >> On 08/10/2014 05:11 AM, Mario Kleiner wrote: >>> Resent this time without HTML formatting which lkml doesn't like. >>> Sorry. >>> >>> On 08/09/2014 03:58 PM, Thomas Hellstrom wrote: >>>> On 08/09/2014 03:33 PM, Konrad Rzeszutek Wilk wrote: >>>>> On August 9, 2014 1:39:39 AM EDT, Thomas >>>>> Hellstrom<thellstrom@xxxxxxxxxx> wrote: >>>>>> Hi. >>>>>> >>>>> Hey Thomas! >>>>> >>>>>> IIRC I don't think the TTM DMA pool allocates coherent pages more >>>>>> than >>>>>> one page at a time, and _if that's true_ it's pretty unnecessary for >>>>>> the >>>>>> dma subsystem to route those allocations to CMA. Maybe Konrad could >>>>>> shed >>>>>> some light over this? >>>>> It should allocate in batches and keep them in the TTM DMA pool for >>>>> some time to be reused. >>>>> >>>>> The pages that it gets are in 4kb granularity though. >>>> Then I feel inclined to say this is a DMA subsystem bug. Single page >>>> allocations shouldn't get routed to CMA. >>>> >>>> /Thomas >>> Yes, seems you're both right. I read through the code a bit more and >>> indeed the TTM DMA pool allocates only one page during each >>> dma_alloc_coherent() call, so it doesn't need CMA memory. The current >>> allocators don't check for single page CMA allocations and therefore >>> try to get it from the CMA area anyway, instead of skipping to the >>> much cheaper fallback. >>> >>> So the callers of dma_alloc_from_contiguous() could need that little >>> optimization of skipping it if only one page is requested. For >>> >>> dma_generic_alloc_coherent >>> <https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/ident?i%3Ddma_generic_alloc_coherent&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A&m=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0A&s=d1852625e2ab2ff07eb34a7f33fc1f55f7f13959912d5a6ce9316d23070ce939> >>> >>> andintel_alloc_coherent >>> <https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/ident?i%3Dintel_alloc_coherent&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A&m=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0A&s=82d587e9b6aeced5cf9a7caefa91bf47fba809f3522b7379d22e45a2d5d35ebd> >>> this >>> seems easy to do. Looking at the arm arch variants, e.g., >>> >>> https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/source/arch/arm/mm/dma-mapping.c%23L1194&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A&m=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0A&s=4c178257eab9b5d7ca650dedba76cf27abeb49ddc7aebb9433f52b6c8bb3bbac >>> >>> >>> and >>> >>> https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/source/arch/arm64/mm/dma-mapping.c%23L44&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A&m=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0A&s=5f62f4cbe8cee1f1dd4cbba656354efe6867bcdc664cf90e9719e2f42a85de08 >>> >>> >>> i'm not sure if it is that easily done, as there aren't any fallbacks >>> for such a case and the code looks to me as if that's at least >>> somewhat intentional. >>> >>> As far as TTM goes, one quick one-line fix to prevent it from using >>> the CMA at least on SWIOTLB, NOMMU and Intel IOMMU (when using the >>> above methods) would be to clear the __GFP_WAIT >>> <https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/ident?i%3D__GFP_WAIT&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A&m=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0A&s=d56d076770d3416264be6c9ea2829ac0d6951203696fa3ad04144f13307577bc> >>> flag from the >>> passed gfp_t flags. That would trigger the well working fallback. >>> So, is >>> >>> __GFP_WAIT >>> <https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/ident?i%3D__GFP_WAIT&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A&m=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0A&s=d56d076770d3416264be6c9ea2829ac0d6951203696fa3ad04144f13307577bc> >>> needed >>> for those single page allocations that go through__ttm_dma_alloc_page >>> <https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/ident?i%3D__ttm_dma_alloc_page&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A&m=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0A&s=7898522bba274e4dcc332735fbcf0c96e48918f60c2ee8e9a3e9c73ab3487bd0>? >>> >>> >>> It would be nice to have such a simple, non-intrusive one-line patch >>> that we still could get into 3.17 and then backported to older stable >>> kernels to avoid the same desktop hangs there if CMA is enabled. It >>> would be also nice for actual users of CMA to not use up lots of CMA >>> space for gpu's which don't need it. I think DMA_CMA was introduced >>> around 3.12. >>> >> I don't think that's a good idea. Omitting __GFP_WAIT would cause >> unnecessary memory allocation errors on systems under stress. >> I think this should be filed as a DMA subsystem kernel bug / regression >> and an appropriate solution should be worked out together with the DMA >> subsystem maintainers and then backported. > > Ok, so it is needed. I'll file a bug report. > >>> The other problem is that probably TTM does not reuse pages from the >>> DMA pool. If i trace the __ttm_dma_alloc_page >>> <https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/ident?i%3D__ttm_dma_alloc_page&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A&m=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0A&s=7898522bba274e4dcc332735fbcf0c96e48918f60c2ee8e9a3e9c73ab3487bd0> >>> and >>> __ttm_dma_free_page >>> <https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/ident?i%3D__ttm_dma_alloc_page&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A&m=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0A&s=7898522bba274e4dcc332735fbcf0c96e48918f60c2ee8e9a3e9c73ab3487bd0> >>> calls for >>> those single page allocs/frees, then over a 20 second interval of >>> tracing and switching tabs in firefox, scrolling things around etc. i >>> find about as many alloc's as i find free's, e.g., 1607 allocs vs. >>> 1648 frees. >> This is because historically the pools have been designed to keep only >> pages with nonstandard caching attributes since changing page caching >> attributes have been very slow but the kernel page allocators have been >> reasonably fast. >> >> /Thomas > > Ok. A bit more ftraceing showed my hang problem case goes through the > "if (is_cached)" paths, so the pool doesn't recycle anything and i see > it bouncing up and down by 4 pages all the time. > > But for the non-cached case, which i don't hit with my problem, could > one of you look at line 954... > > https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/source/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c%23L954&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A&m=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0A&s=e15c51805d429ee6d8960d6b88035e9811a1cdbfbf13168eec2fbb2214b99c60 > > > ... and tell me why that unconditional npages = count; assignment > makes sense? It seems to essentially disable all recycling for the dma > pool whenever the pool isn't filled up to/beyond its maximum with free > pages? When the pool is filled up, lots of stuff is recycled, but when > it is already somewhat below capacity, it gets "punished" by not > getting refilled? I'd just like to understand the logic behind that line. > > thanks, > -mario I'll happily forward that question to Konrad who wrote the code (or it may even stem from the ordinary page pool code which IIRC has Dave Airlie / Jerome Glisse as authors) /Thomas _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel