Re: [PATCH 05/13] drm/ttm: overhaul memory accounting

Thomas Hellstrom <thellstrom@xxxxxxxxxx> · Fri, 11 Nov 2011 17:22:26 +0100

On 11/11/2011 04:47 PM, Jerome Glisse wrote:
On Fri, Nov 11, 2011 at 08:49:39AM +0100, Thomas Hellstrom wrote:

On 11/11/2011 12:33 AM, Jerome Glisse wrote:

On Thu, Nov 10, 2011 at 09:05:22PM +0100, Thomas Hellstrom wrote:

On 11/10/2011 07:05 PM, Jerome Glisse wrote:

On Thu, Nov 10, 2011 at 11:27:33AM +0100, Thomas Hellstrom wrote:

On 11/09/2011 09:22 PM, j.glisse@xxxxxxxxx wrote:

From: Jerome Glisse<jglisse@xxxxxxxxxx>

This is an overhaul of the ttm memory accounting. This tries to keep
the same global behavior while removing the whole zone concept. It
keeps a distrinction for dma32 so that we make sure that ttm don't
starve the dma32 zone.

There is 3 threshold for memory allocation :
- max_mem is the maximum memory the whole ttm infrastructure is
   going to allow allocation for (exception of system process see
   below)
- emer_mem is the maximum memory allowed for system process, this
   limit is>     to max_mem
- swap_limit is the threshold at which point ttm will start to
   try to swap object because ttm is getting close the max_mem
   limit
- swap_dma32_limit is the threshold at which point ttm will start
   swap object to try to reduce the pressure on the dma32 zone. Note
   that we don't specificly target object to swap to it might very
   well free more memory from highmem rather than from dma32

Accounting is done through used_mem&     used_dma32_mem, which sum give
the total amount of memory actually accounted by ttm.

Idea is that allocation will fail if (used_mem + used_dma32_mem)>
max_mem and if swapping fail to make enough room.

The used_dma32_mem can be updated as a later stage, allowing to
perform accounting test before allocating a whole batch of pages.

Jerome, you're removing a fair amount of functionality here, without
justifying
why it could be removed.

All this code was overkill.

[1] I don't agree, and since it's well tested, thought throught and
working, I see no obvious reason to alter it,
within the context of this patch series unless it's absolutely
required for the functionality.

Well one thing i can tell is that it doesn't work on radeon, i pushed
a test to libdrm and here it's the oom that starts doing its beating.
Anyway i won't alter it. Was just trying to make it works, ie be useful
while also being simpler.

Well if it doesn't work it should of course be fixed.

I'm not against fixing it nor making it simpler, but I think that
requires a detailed understanding of what's going wrong and how it
needs to be fixed. Not as part of a patch series that really tries
to accomplish something else.

The current code was tested extensively with psb and unichrome.
One good test for drivers with bo-backed textures is to continously
create fairly large texture images. The end result should be the
swap space starting to fill up and once there is no more swap space,
the OOM killer should kill your app, and kmalloc failures should be
avoided. It should be tricky to get a failure from the global alloc
system, but a huge amount of small buffer objects or fence objects
should probably do it.

Naturally, that requires that all persistent drm objects created
from user-space are registered with their correct sizes, or at least
a really good size approximation. That includes things like gem
flinks, that could otherwise easily be exploited to bring a system
down, simply by guessing a gem name and create flinks to that name
in an infinite loop.

What are the symptoms of the failure you're seeing with Radeon? Any
suggestions on why it happens?

Thanks,
Thomas

I pushed my test case to libdrm yesterday, i basicly alloc ttm object
of 1 page in a loop and expect it to fail. I modified the kernel to
account 2 page for the ttm_buffer_object struct size so that the kernel
area should be exhausted long before i run out of memory on a 8G
config. What happen is that the oom start killing everythings except
my app, even the kernel logger daemon got kill before my app ...

I think the ttm_memory accounting for kernel object is not the right
way.

....

So, yet again, TTM gets incorrectly blamed when things are not
working as expected.

The TTM memory accounting is designed to avoid pinning too much memory 
for graphics, so that it can't be
used by the rest of the system. It's working well doing exactly that.

However, it can't stop your app from wanting to store too much data. It 
just shuffles that data to swap. If too many apps want to store too much 
data, eventually the computer runs out of swap space and the OOM killer 
kicks in, and
tries to guess what app to kill. That's not TTM's business. Nor is it 
DRM's business.

The only time the TTM memory accounting system blocks an allocation is 
if there is too much pinned memory allocated (kmalloc, vmalloc) that it 
can't release to swap space. It protects against kmalloc failures, but 
it makes
no attempt to stop your app from wanting to store too much data.

/Thomas

_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/dri-devel