Hi,
On Wed, Apr 8, 2015 at 1:24 PM, Daniel Stone <daniel@xxxxxxxxxxxxx> wrote:
Related: pboUnpack http://www.songho.ca/opengl/files/pboUnpack.zip
gives: Transfer Rate: 236.5 MB/s. (59.1 FPS)
__memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:86
86 ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S: No such file or directory.
(gdb) bt
#0 __memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:86
#1 0x00007ffff2bddbbd in memcpy (__len=4194304, __src=<optimized out>, __dest=<optimized out>) at /usr/include/x86_64-linux-gnu/bits/string3.h:51
#2 memcpy_texture (dimensions=dimensions@entry=2, dstFormat=dstFormat@entry=MESA_FORMAT_B8G8R8A8_UNORM, dstRowStride=dstRowStride@entry=4096, dstSlices=dstSlices@entry=0x7fffffffd6e8,
srcWidth=srcWidth@entry=1024, srcHeight=srcHeight@entry=1024, srcDepth=srcDepth@entry=1, srcFormat=srcFormat@entry=32993, srcType=srcType@entry=5121, srcAddr=srcAddr@entry=0x7fffeeecd000,
srcPacking=srcPacking@entry=0x7ffff7f69180, ctx=<optimized out>) at ../../../../src/mesa/main/texstore.c:949
#3 0x00007ffff2be353d in _mesa_texstore_memcpy (srcPacking=0x7ffff7f69180, srcAddr=<optimized out>, srcType=5121, srcFormat=32993, srcDepth=<optimized out>, srcHeight=<optimized out>,
srcWidth=<optimized out>, dstSlices=<optimized out>, dstRowStride=<optimized out>, dstFormat=MESA_FORMAT_B8G8R8A8_UNORM, baseInternalFormat=6408, dims=<optimized out>, ctx=0x7ffff7f4d010)
at ../../../../src/mesa/main/texstore.c:3938
#4 _mesa_texstore (ctx=0x7ffff7f4d010, dims=2, baseInternalFormat=6408, dstFormat=MESA_FORMAT_B8G8R8A8_UNORM, dstRowStride=4096, dstSlices=0x7fffffffd6e8, srcWidth=1024, srcHeight=1024, srcDepth=1,
srcFormat=32993, srcType=5121, srcAddr=0x7fffeeecd000, srcPacking=0x7ffff7f69180) at ../../../../src/mesa/main/texstore.c:3958
#5 0x00007ffff2be3812 in store_texsubimage (ctx=ctx@entry=0x7ffff7f4d010, texImage=texImage@entry=0x7c8690, xoffset=xoffset@entry=0, yoffset=yoffset@entry=0, zoffset=zoffset@entry=0, width=1024,
height=1024, depth=1, format=32993, type=5121, pixels=0x0, packing=0x7ffff7f69180, caller=0x7ffff2d609c7 "glTexSubImage") at ../../../../src/mesa/main/texstore.c:4107
#6 0x00007ffff2be3aa5 in _mesa_store_texsubimage (ctx=ctx@entry=0x7ffff7f4d010, dims=<optimized out>, texImage=texImage@entry=0x7c8690, xoffset=xoffset@entry=0, yoffset=yoffset@entry=0,
zoffset=zoffset@entry=0, width=<optimized out>, width@entry=1024, height=<optimized out>, height@entry=1024, depth=<optimized out>, depth@entry=1, format=<optimized out>, format@entry=32993,
type=<optimized out>, type@entry=5121, pixels=<optimized out>, pixels@entry=0x0, packing=<optimized out>, packing@entry=0x7ffff7f69180) at ../../../../src/mesa/main/texstore.c:4171
#7 0x00007ffff2c3acaa in st_TexSubImage (ctx=0x7ffff7f4d010, dims=<optimized out>, texImage=0x7c8690, xoffset=0, yoffset=0, zoffset=0, width=1024, height=1024, depth=1, format=32993, type=5121,
pixels=0x0, unpack=0x7ffff7f69180) at ../../../../src/mesa/state_tracker/st_cb_texture.c:787
#8 0x00007ffff2bce83d in texsubimage (ctx=0x7ffff7f4d010, dims=dims@entry=2, target=3553, level=0, xoffset=0, yoffset=0, zoffset=zoffset@entry=0, width=1024, height=1024, depth=depth@entry=1,
format=format@entry=32993, type=type@entry=5121, pixels=pixels@entry=0x0) at ../../../../src/mesa/main/teximage.c:3445
#9 0x00007ffff2bd259c in _mesa_TexSubImage2D (target=<optimized out>, level=<optimized out>, xoffset=<optimized out>, yoffset=<optimized out>, width=<optimized out>, height=<optimized out>,
format=32993, type=5121, pixels=0x0) at ../../../../src/mesa/main/teximage.c:3483
Hi,
On 8 April 2015 at 10:57, Vasilis Liaskovitis <vliaskov@xxxxxxxxx> wrote:
> I have an issue where st_TexSubImage causes very high CPU load in
> __memcpy_sse2_unaligned (Mesa 10.1.3, Xorg 1.15.1, radeon driver, HD 7870).
>
> Any obvious causes / tips for this? e.g. align textures or use different
> format/type? I 've tried using GL_BGRA/GL_UNSIGNED_BYTE and
> GL_BGRA/GL_UNSIGNED_INT_8_8_8_8_REV
>
> __memcpy_sse2_unaligned () at
> ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:85
> 85 ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S: No such file or
> directory.
> (gdb) bt
> #0 __memcpy_sse2_unaligned () at
> ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:85
> #1 0x00007fffb572f154 in memcpy (__len=7680, __src=<optimized out>,
> __dest=0x7fff5835f800) at /usr/include/x86_64-linux-gnu/bits/string3.h:51
> #2 st_TexSubImage (ctx=0x1b91420, dims=<optimized out>, texImage=0x1f81710,
> xoffset=0, yoffset=0, zoffset=0, width=1920, height=1080, depth=1,
> format=32993, type=5121, pixels=0xdacf90, unpack=0x1bad590)
> at ../../../../src/mesa/state_tracker/st_cb_texture.c:752
Your source (0xdacf90) is only aligned to a 16-byte boundary, not 32.
This will cause issues particularly on ARM, where natural alignment is
required (i.e. 32-byte load/stores must be on 32-byte boundaries). By
contrast, the destination is already aligned to a 128-byte boundary.
So fixing the caller, rather than Mesa, should take care of the
problem.
thanks for the reply and the observation. I aligned source on 32-byte boundary (or even 128-byte boundary) but there was no difference.
By the way, I am only using x86_64, not ARM. I believe intel sse2 only requires 16-byte boundary alignment, but perhaps i am missing something.
Is this code path in st_TexSubImage using PBOs? I guess it depends on driver (radeon in my case) implementation?
Related: pboUnpack http://www.songho.ca/opengl/files/pboUnpack.zip
gives: Transfer Rate: 236.5 MB/s. (59.1 FPS)
Does this sounds reasonably ok for uploading with PBO?
Same bottleneck __memcpy_sse2_unaligned is observed.
sample perf report output:
28,20% pboUnpack libc-2.19.so [.] __memcpy_sse2_unaligned
16,63% pboUnpack pboUnpack [.] 0x0000000000006542
6,96% pboUnpack [kernel.kallsyms] [k] clear_page_c_e
2,52% pboUnpack [drm] [k] drm_mm_insert_node_in_range_generic
2,10% pboUnpack [kernel.kallsyms] [k] get_page_from_freelist
28,20% pboUnpack libc-2.19.so [.] __memcpy_sse2_unaligned
16,63% pboUnpack pboUnpack [.] 0x0000000000006542
6,96% pboUnpack [kernel.kallsyms] [k] clear_page_c_e
2,52% pboUnpack [drm] [k] drm_mm_insert_node_in_range_generic
2,10% pboUnpack [kernel.kallsyms] [k] get_page_from_freelist
backtrace:
__memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:86
86 ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S: No such file or directory.
(gdb) bt
#0 __memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:86
#1 0x00007ffff2bddbbd in memcpy (__len=4194304, __src=<optimized out>, __dest=<optimized out>) at /usr/include/x86_64-linux-gnu/bits/string3.h:51
#2 memcpy_texture (dimensions=dimensions@entry=2, dstFormat=dstFormat@entry=MESA_FORMAT_B8G8R8A8_UNORM, dstRowStride=dstRowStride@entry=4096, dstSlices=dstSlices@entry=0x7fffffffd6e8,
srcWidth=srcWidth@entry=1024, srcHeight=srcHeight@entry=1024, srcDepth=srcDepth@entry=1, srcFormat=srcFormat@entry=32993, srcType=srcType@entry=5121, srcAddr=srcAddr@entry=0x7fffeeecd000,
srcPacking=srcPacking@entry=0x7ffff7f69180, ctx=<optimized out>) at ../../../../src/mesa/main/texstore.c:949
#3 0x00007ffff2be353d in _mesa_texstore_memcpy (srcPacking=0x7ffff7f69180, srcAddr=<optimized out>, srcType=5121, srcFormat=32993, srcDepth=<optimized out>, srcHeight=<optimized out>,
srcWidth=<optimized out>, dstSlices=<optimized out>, dstRowStride=<optimized out>, dstFormat=MESA_FORMAT_B8G8R8A8_UNORM, baseInternalFormat=6408, dims=<optimized out>, ctx=0x7ffff7f4d010)
at ../../../../src/mesa/main/texstore.c:3938
#4 _mesa_texstore (ctx=0x7ffff7f4d010, dims=2, baseInternalFormat=6408, dstFormat=MESA_FORMAT_B8G8R8A8_UNORM, dstRowStride=4096, dstSlices=0x7fffffffd6e8, srcWidth=1024, srcHeight=1024, srcDepth=1,
srcFormat=32993, srcType=5121, srcAddr=0x7fffeeecd000, srcPacking=0x7ffff7f69180) at ../../../../src/mesa/main/texstore.c:3958
#5 0x00007ffff2be3812 in store_texsubimage (ctx=ctx@entry=0x7ffff7f4d010, texImage=texImage@entry=0x7c8690, xoffset=xoffset@entry=0, yoffset=yoffset@entry=0, zoffset=zoffset@entry=0, width=1024,
height=1024, depth=1, format=32993, type=5121, pixels=0x0, packing=0x7ffff7f69180, caller=0x7ffff2d609c7 "glTexSubImage") at ../../../../src/mesa/main/texstore.c:4107
#6 0x00007ffff2be3aa5 in _mesa_store_texsubimage (ctx=ctx@entry=0x7ffff7f4d010, dims=<optimized out>, texImage=texImage@entry=0x7c8690, xoffset=xoffset@entry=0, yoffset=yoffset@entry=0,
zoffset=zoffset@entry=0, width=<optimized out>, width@entry=1024, height=<optimized out>, height@entry=1024, depth=<optimized out>, depth@entry=1, format=<optimized out>, format@entry=32993,
type=<optimized out>, type@entry=5121, pixels=<optimized out>, pixels@entry=0x0, packing=<optimized out>, packing@entry=0x7ffff7f69180) at ../../../../src/mesa/main/texstore.c:4171
#7 0x00007ffff2c3acaa in st_TexSubImage (ctx=0x7ffff7f4d010, dims=<optimized out>, texImage=0x7c8690, xoffset=0, yoffset=0, zoffset=0, width=1024, height=1024, depth=1, format=32993, type=5121,
pixels=0x0, unpack=0x7ffff7f69180) at ../../../../src/mesa/state_tracker/st_cb_texture.c:787
#8 0x00007ffff2bce83d in texsubimage (ctx=0x7ffff7f4d010, dims=dims@entry=2, target=3553, level=0, xoffset=0, yoffset=0, zoffset=zoffset@entry=0, width=1024, height=1024, depth=depth@entry=1,
format=format@entry=32993, type=type@entry=5121, pixels=pixels@entry=0x0) at ../../../../src/mesa/main/teximage.c:3445
#9 0x00007ffff2bd259c in _mesa_TexSubImage2D (target=<optimized out>, level=<optimized out>, xoffset=<optimized out>, yoffset=<optimized out>, width=<optimized out>, height=<optimized out>,
format=32993, type=5121, pixels=0x0) at ../../../../src/mesa/main/teximage.c:3483
pixels pointer in st_texSubImage is 0x0 here, maybe because it's an internal pbo to texture transfer?
srcAddr in memcpy_texture() is 0x7fffeeecd000 which looks sufficiently aligned, but maybe this is not the correct pointer to look at.
could there also be a CPU stall/sync issue when mapping a pbo buffer?
Similar pbounpack/memcpy performance discussed a bit here recently with no conclusion: http://people.freedesktop.org/~cbrill/dri-log/?channel=dri-devel&date=2015-01-01
Similar pbounpack/memcpy performance discussed a bit here recently with no conclusion: http://people.freedesktop.org/~cbrill/dri-log/?channel=dri-devel&date=2015-01-01
thanks,
- Vasilis
Cheers,
Daniel
_______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel