Hello, I am currently working on an application which has to stream huge amounts of texture data to the GPU. I use multiple buffers to avoid mapping a buffer which is currently involved in a DMA operation, so there is typically a delay of more than 1 frame between triggering an upload with glTexSubImage2D and mapping the buffer again. The typical state of my buffers some frame could look like: PBO0: mapped PBO1: mapped PBO2: glUnmapBuffer(PBO2) PBO3: currently used by glTexSubImage2D(PBO3, ... PBO4: unmapped & unused, texture of PBO4 used used for drawing PBO5: unmapped & idle PBO6: glMapBuffer(PBO6) PBO7: mapped PBO8: mapped PBO9: mapped However I get quite confusing results INTEL_DEBUG=perf. The best-case seems to be when there are only 1-2 buffers mapped at a time: Transfer Rate: 1455.2 MB/s. (121.3 FPS) GTT mapping a busy miptree BO stalled and took 0.686 ms. However even increasing the number of buffers mapped concurrently, with the same delay between unmapping/using/mapping, reduces throughput drmaatically: GTT mapping a busy miptree BO stalled and took 7.128 ms. Transfer Rate: 844.4 MB/s. (70.4 FPS) Increasing the delay between glTexSubImage2D and glMapBuffer doesn't seem to improve the situation at all. Also, discarding the buffer contents before mapping (glBufferDataARB) doesn't reduce the stall time of 7ms. Is the large number (~4-6) of simultaneously mapped buffers a problem? Thank you in advnace, Clemens _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx