Re: Failed to find memory space for buffer eviction

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 15.07.20 um 17:14 schrieb Felix Kuehling:
Am 2020-07-15 um 5:28 a.m. schrieb Christian König:
Am 15.07.20 um 04:49 schrieb Felix Kuehling:
Am 2020-07-14 um 4:28 a.m. schrieb Christian König:
Hi Felix,

yes I already stumbled over this as well quite recently.

See the following patch which I pushed to drm-misc-next just yesterday:

commit e04be2310b5eac683ec03b096c0e22c4c2e23593
Author: Christian König <christian.koenig@xxxxxxx>
Date:   Mon Jul 6 17:32:55 2020 +0200

      drm/ttm: further cleanup ttm_mem_reg handling

      Stop touching the backend private pointer alltogether and
      make sure we never put the same mem twice by.

      Signed-off-by: Christian König <christian.koenig@xxxxxxx>
      Reviewed-by: Madhav Chauhan <madhav.chauhan@xxxxxxx>
      Link:
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.freedesktop.org%2Fpatch%2F375613%2F&amp;data=02%7C01%7Cfelix.kuehling%40amd.com%7Cd859556fb0f04658081208d828a16797%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637304020992423068&amp;sdata=Dpno3Wmqgyb%2FkRWzoye9T3tBg8BEgCXM0THGw8pKESY%3D&amp;reserved=0


But this shouldn't have been problematic since we used a dummy value
for mem->mm_node in this case.
Hmm, yeah, I was reading the code wrong. It's possible that I was really
just out of GTT space. But see below.
It looks like it yes.
I checked. I don't see a general GTT space leak. During the eviction
test the GTT usage spikes, but after finishing the test, GTT usage goes
back down to 7MB.


What could be problematic and result is an overrun is that TTM was
buggy and called put_node twice for the same memory.

So I've seen that the code needs fixing as well, but I'm not 100% sure
how you ran into your problem.
This is in the KFD eviction test, which deliberately overcommits VRAM in
order to trigger lots of evictions. It will use some GTT space while BOs
are evicted. But shouldn't it move them further out of GTT and into
SYSTEM to free up GTT space?
Yes, exactly that should happen.

But for some reason it couldn't find a candidate to evict and the
14371 pages left are just a bit to small for the buffer.
That would be a nested eviction. A VRAM to GTT eviction requires a GTT
to SYSTEM eviction to make space in GTT. Is that even possible?

Yes, this is the core of the TTM design problem which I talked about in my FOSDEM presentation in February.

Question do we still have this crude workaround that KFD is not taking all reservations of the current process when allocating new BOs?

That could maybe cause this as well.

Regards,
Christian.


Regards,
   Felix


Regards,
Christian.

Your change "further cleanup ttm_mem_reg handling" removes a
mem->mm_node = NULL in ttm_bo_handle_move_mem in exactly the case where
a BO is moved from GTT to SYSTEM. I think that leads to a later put_node
call not happening or amdgpu_gtt_mgr_del returning before incrementing
mgr->available.

I can try if cherry-picking your two fixes will help with the
eviction test.

Regards,
    Felix


Regards,
Christian.

Am 14.07.20 um 02:44 schrieb Felix Kuehling:
I'm running into this problem with the KFD EvictionTest. The log
snippet
below looks like it ran out of GTT space for the eviction of a 64MB
buffer. But then it dumps the used and free space and shows plenty of
free space.

As I understand it, the per-page breakdown of used and free space
shown
by TTM is the GART space. So it's not very meaningful.

What matters more is the GTT space managed by amdgpu_gtt_mgr.c. And
that's where the problem is. It keeps track of available GTT space
with
an atomic counter in amdgpu_gtt_mgr.available. It gets decremented in
amdgpu_gtt_mgr_new and incremented in amdgpu_gtt_mgr_del. The trouble
is, that TTM doesn't call the latter for ttm_mem_regs that don't
have an
mm_node:

void ttm_bo_mem_put(struct ttm_buffer_object *bo, struct ttm_mem_reg
*mem)
{
           struct ttm_mem_type_manager *man =
&bo->bdev->man[mem->mem_type];

           if (mem->mm_node)
                   (*man->func->put_node)(man, mem);
}
GTT BOs that don't have GART space allocated, don't hate an
mm_node. So
the amdgpu_gtt_mgr.available counter doesn't get incremented when an
unmapped GTT BO is freed, and eventually runs out of space.

Now I know what the problem is, but I don't know how to fix it.
Maybe a
dummy-mm_node for unmapped GTT BOs, to trick TTM into calling our
put_node callback? Or a change in TTM to call put_node
unconditionally?

Regards,
     Felix


[  360.082552] [TTM] Failed to find memory space for buffer
0x00000000264c823c eviction
[  360.090331] [TTM]  No space for 00000000264c823c (16384 pages,
65536K, 64M)
[  360.090334] [TTM]    placement[0]=0x00010002 (1)
[  360.090336] [TTM]      has_type: 1
[  360.090337] [TTM]      use_type: 1
[  360.090339] [TTM]      flags: 0x0000000A
[  360.090341] [TTM]      gpu_offset: 0xFF00000000
[  360.090342] [TTM]      size: 1048576
[  360.090344] [TTM]      available_caching: 0x00070000
[  360.090346] [TTM]      default_caching: 0x00010000
[  360.090349] [TTM]  0x0000000000000400-0x0000000000000402: 2: used
[  360.090352] [TTM]  0x0000000000000402-0x0000000000000404: 2: used
[  360.090354] [TTM]  0x0000000000000404-0x0000000000000406: 2: used
[  360.090355] [TTM]  0x0000000000000406-0x0000000000000408: 2: used
[  360.090357] [TTM]  0x0000000000000408-0x000000000000040a: 2: used
[  360.090359] [TTM]  0x000000000000040a-0x000000000000040c: 2: used
[  360.090361] [TTM]  0x000000000000040c-0x000000000000040e: 2: used
[  360.090363] [TTM]  0x000000000000040e-0x0000000000000410: 2: used
[  360.090365] [TTM]  0x0000000000000410-0x0000000000000412: 2: used
[  360.090367] [TTM]  0x0000000000000412-0x0000000000000414: 2: used
[  360.090368] [TTM]  0x0000000000000414-0x0000000000000415: 1: used
[  360.090370] [TTM]  0x0000000000000415-0x0000000000000515: 256: used
[  360.090372] [TTM]  0x0000000000000515-0x0000000000000516: 1: used
[  360.090374] [TTM]  0x0000000000000516-0x0000000000000517: 1: used
[  360.090376] [TTM]  0x0000000000000517-0x0000000000000518: 1: used
[  360.090378] [TTM]  0x0000000000000518-0x0000000000000519: 1: used
[  360.090379] [TTM]  0x0000000000000519-0x000000000000051a: 1: used
[  360.090381] [TTM]  0x000000000000051a-0x000000000000051b: 1: used
[  360.090383] [TTM]  0x000000000000051b-0x000000000000051c: 1: used
[  360.090385] [TTM]  0x000000000000051c-0x000000000000051d: 1: used
[  360.090387] [TTM]  0x000000000000051d-0x000000000000051f: 2: used
[  360.090389] [TTM]  0x000000000000051f-0x0000000000000521: 2: used
[  360.090391] [TTM]  0x0000000000000521-0x0000000000000522: 1: used
[  360.090392] [TTM]  0x0000000000000522-0x0000000000000523: 1: used
[  360.090394] [TTM]  0x0000000000000523-0x0000000000000524: 1: used
[  360.090396] [TTM]  0x0000000000000524-0x0000000000000525: 1: used
[  360.090398] [TTM]  0x0000000000000525-0x0000000000000625: 256: used
[  360.090400] [TTM]  0x0000000000000625-0x0000000000000725: 256: used
[  360.090402] [TTM]  0x0000000000000725-0x0000000000000727: 2: used
[  360.090404] [TTM]  0x0000000000000727-0x00000000000007c0: 153: used
[  360.090406] [TTM]  0x00000000000007c0-0x0000000000000b8a: 970: used
[  360.090407] [TTM]  0x0000000000000b8a-0x0000000000000b8b: 1: used
[  360.090409] [TTM]  0x0000000000000b8b-0x0000000000000bcb: 64: used
[  360.090411] [TTM]  0x0000000000000bcb-0x0000000000000bcd: 2: used
[  360.090413] [TTM]  0x0000000000000bcd-0x0000000000040000: 259123:
free
[  360.090415] [TTM]  total: 261120, used 1997 free 259123
[  360.090417] [TTM]  man size:1048576 pages, gtt available:14371
pages,
usage:4039MB


_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=02%7C01%7Cfelix.kuehling%40amd.com%7Cd859556fb0f04658081208d828a16797%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637304020992423068&amp;sdata=DTQpd9F8ST2i1VR9N4oCUfd88FimI4wShTvC%2BeR2ZSE%3D&amp;reserved=0


_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux