On Fri, Nov 11, 2011 at 05:22:26PM +0100, Thomas Hellstrom wrote: > On 11/11/2011 04:47 PM, Jerome Glisse wrote: > >On Fri, Nov 11, 2011 at 08:49:39AM +0100, Thomas Hellstrom wrote: > >>On 11/11/2011 12:33 AM, Jerome Glisse wrote: > >>>On Thu, Nov 10, 2011 at 09:05:22PM +0100, Thomas Hellstrom wrote: > >>>>On 11/10/2011 07:05 PM, Jerome Glisse wrote: > >>>>>On Thu, Nov 10, 2011 at 11:27:33AM +0100, Thomas Hellstrom wrote: > >>>>>>On 11/09/2011 09:22 PM, j.glisse@xxxxxxxxx wrote: > >>>>>>>From: Jerome Glisse<jglisse@xxxxxxxxxx> > >>>>>>> > >>>>>>>This is an overhaul of the ttm memory accounting. This tries to keep > >>>>>>>the same global behavior while removing the whole zone concept. It > >>>>>>>keeps a distrinction for dma32 so that we make sure that ttm don't > >>>>>>>starve the dma32 zone. > >>>>>>> > >>>>>>>There is 3 threshold for memory allocation : > >>>>>>>- max_mem is the maximum memory the whole ttm infrastructure is > >>>>>>> going to allow allocation for (exception of system process see > >>>>>>> below) > >>>>>>>- emer_mem is the maximum memory allowed for system process, this > >>>>>>> limit is> to max_mem > >>>>>>>- swap_limit is the threshold at which point ttm will start to > >>>>>>> try to swap object because ttm is getting close the max_mem > >>>>>>> limit > >>>>>>>- swap_dma32_limit is the threshold at which point ttm will start > >>>>>>> swap object to try to reduce the pressure on the dma32 zone. Note > >>>>>>> that we don't specificly target object to swap to it might very > >>>>>>> well free more memory from highmem rather than from dma32 > >>>>>>> > >>>>>>>Accounting is done through used_mem& used_dma32_mem, which sum give > >>>>>>>the total amount of memory actually accounted by ttm. > >>>>>>> > >>>>>>>Idea is that allocation will fail if (used_mem + used_dma32_mem)> > >>>>>>>max_mem and if swapping fail to make enough room. > >>>>>>> > >>>>>>>The used_dma32_mem can be updated as a later stage, allowing to > >>>>>>>perform accounting test before allocating a whole batch of pages. > >>>>>>> > >>>>>>Jerome, you're removing a fair amount of functionality here, without > >>>>>>justifying > >>>>>>why it could be removed. > >>>>>All this code was overkill. > >>>>[1] I don't agree, and since it's well tested, thought throught and > >>>>working, I see no obvious reason to alter it, > >>>>within the context of this patch series unless it's absolutely > >>>>required for the functionality. > >>>Well one thing i can tell is that it doesn't work on radeon, i pushed > >>>a test to libdrm and here it's the oom that starts doing its beating. > >>>Anyway i won't alter it. Was just trying to make it works, ie be useful > >>>while also being simpler. > >>Well if it doesn't work it should of course be fixed. > >> > >>I'm not against fixing it nor making it simpler, but I think that > >>requires a detailed understanding of what's going wrong and how it > >>needs to be fixed. Not as part of a patch series that really tries > >>to accomplish something else. > >> > >>The current code was tested extensively with psb and unichrome. > >>One good test for drivers with bo-backed textures is to continously > >>create fairly large texture images. The end result should be the > >>swap space starting to fill up and once there is no more swap space, > >>the OOM killer should kill your app, and kmalloc failures should be > >>avoided. It should be tricky to get a failure from the global alloc > >>system, but a huge amount of small buffer objects or fence objects > >>should probably do it. > >> > >>Naturally, that requires that all persistent drm objects created > >>from user-space are registered with their correct sizes, or at least > >>a really good size approximation. That includes things like gem > >>flinks, that could otherwise easily be exploited to bring a system > >>down, simply by guessing a gem name and create flinks to that name > >>in an infinite loop. > >> > >>What are the symptoms of the failure you're seeing with Radeon? Any > >>suggestions on why it happens? > >> > >>Thanks, > >>Thomas > >I pushed my test case to libdrm yesterday, i basicly alloc ttm object > >of 1 page in a loop and expect it to fail. I modified the kernel to > >account 2 page for the ttm_buffer_object struct size so that the kernel > >area should be exhausted long before i run out of memory on a 8G > >config. What happen is that the oom start killing everythings except > >my app, even the kernel logger daemon got kill before my app ... > > > >I think the ttm_memory accounting for kernel object is not the right > >way. > .... > > So, yet again, TTM gets incorrectly blamed when things are not > working as expected. > > The TTM memory accounting is designed to avoid pinning too much > memory for graphics, so that it can't be > used by the rest of the system. It's working well doing exactly that. > > However, it can't stop your app from wanting to store too much data. > It just shuffles that data to swap. If too many apps want to store > too much data, eventually the computer runs out of swap space and > the OOM killer kicks in, and > tries to guess what app to kill. That's not TTM's business. Nor is > it DRM's business. > > The only time the TTM memory accounting system blocks an allocation > is if there is too much pinned memory allocated (kmalloc, vmalloc) > that it can't release to swap space. It protects against kmalloc > failures, but it makes > no attempt to stop your app from wanting to store too much data. > My test only try to allocate kernel memory, so i have 8G of ram i allocate 1 page for each object (highmem forced) and then for the accounting i account 2 page per each kernel kmalloc of bo struct. So ttm memory should stop allowing any more allocation as soon as i allocated for 400M of kernel memory. Which should happen with only 200M of highmem allocate so all well below the limit. I also setup 4G of swap so in the end the oom shouldn't kick in before we exhauste our kernel memory. I think here the issue is that the kernel memory pool got decrease wrongly when swaping object as the ttm buffer struct is not swaped and still there. I am chasing a bug in swapout + dma page alloc first, i will get back to why ttm memory accouting fails after that. Cheers, Jerome _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel