On 11/29/2012 08:20 PM, Marek Olšák wrote:
On Thu, Nov 29, 2012 at 10:18 AM, Thomas Hellstrom <thomas@xxxxxxxxxxxx> wrote:
On 11/28/2012 10:51 PM, Marek Olšák wrote:
I think the problem with Radeon/TTM is much deeper. Let me demonstrate
it on the following example.
Unigine Heaven needs about 385MB of space for static resources, that's
only 75% of my 512MB card. Yet, TTM is not capable of getting all of
that into VRAM. If I allow GTT placements, I get 20 fps, which is the
old Mesa behavior. If I force VRAM placements, I get 3 fps, because we
validate buffers 10 times per frame and there's probably a lot of
buffer evictions during each validation.
Marek,
Did you look at the total amount of referenced buffers in the ring including
vertex buffers?
Depending on how hard you throttle, I guess vertex / index buffer data
referenced by the
ring commands may well exceed the VRAM limitation.
Buffers (not textures) take only 30 MB. These are stats for 1 frame of
Unigine Heaven. Each line is a CS ioctl.
VRAM [used in CS] / [total allocated], GTT [used in CS] / [total allocated]
1. VRAM: 171 / 390 MB, GTT: 1 / 5 MB
2. VRAM: 144 / 390 MB, GTT: 2 / 5 MB
3. VRAM: 184 / 390 MB, GTT: 1 / 5 MB
4. VRAM: 35 / 390 MB, GTT: 2 / 5 MB
5. VRAM: 119 / 390 MB, GTT: 1 / 5 MB
6. VRAM: 207 / 390 MB, GTT: 1 / 5 MB
7. VRAM: 65 / 390 MB, GTT: 2 / 5 MB
If I move all buffers (vertex, index, constant, streamout, queries,
shader code, etc.) to GTT, this is how one frame looks like (not the
same one though, but it's close):
1. VRAM: 144 / 359 MB, GTT: 16 / 35 MB
2. VRAM: 95 / 359 MB, GTT: 12 / 35 MB
3. VRAM: 178 / 359 MB, GTT: 15 / 35 MB
4. VRAM: 55 / 359 MB, GTT: 13 / 35 MB
5. VRAM: 22 / 359 MB, GTT: 16 / 35 MB
6. VRAM: 163 / 359 MB, GTT: 16 / 35 MB
7. VRAM: 133 / 359 MB, GTT: 11 / 35 MB
8. VRAM: 66 / 359 MB, GTT: 4 / 35 MB
The stats are generated in the Mesa driver based on the driver's
expectations where buffers should be placed.
I can easily see how VRAM is thrashed with the strict LRU approach.
Also, is it possible that one buffer is moved twice for a single CS
ioctl? Imagine there's a buffer at the end of the relocation list,
which is also at the head of the LRU list. Some buffer in the middle
causes eviction of the last buffer. When the last buffer is validated,
it's moved back to VRAM. Can it happen?
No. Typically that shouldn't happen. In a typical CS sequence, first all
buffers are reserved, and then all buffers
are validated. Reservation takes them off the LRU list, I'm not 100%
sure Radeon does it this way, but I think so.
/Thomas
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/dri-devel