Re: [Mesa-dev] st_TexSubImage: unaligned memcpy performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On Wed, Apr 8, 2015 at 1:24 PM, Daniel Stone <daniel@xxxxxxxxxxxxx> wrote:
Hi,

On 8 April 2015 at 10:57, Vasilis Liaskovitis <vliaskov@xxxxxxxxx> wrote:
> I have an issue where st_TexSubImage causes very high CPU load in
> __memcpy_sse2_unaligned (Mesa 10.1.3, Xorg 1.15.1, radeon driver, HD 7870).
>
> Any obvious causes / tips for this? e.g. align textures or use different
> format/type? I 've tried using GL_BGRA/GL_UNSIGNED_BYTE and
> GL_BGRA/GL_UNSIGNED_INT_8_8_8_8_REV
>
> __memcpy_sse2_unaligned () at
> ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:85
> 85    ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S: No such file or
> directory.
> (gdb) bt
> #0  __memcpy_sse2_unaligned () at
> ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:85
> #1  0x00007fffb572f154 in memcpy (__len=7680, __src=<optimized out>,
> __dest=0x7fff5835f800) at /usr/include/x86_64-linux-gnu/bits/string3.h:51
> #2  st_TexSubImage (ctx=0x1b91420, dims=<optimized out>, texImage=0x1f81710,
> xoffset=0, yoffset=0, zoffset=0, width=1920, height=1080, depth=1,
> format=32993, type=5121, pixels=0xdacf90, unpack=0x1bad590)
>     at ../../../../src/mesa/state_tracker/st_cb_texture.c:752

Your source (0xdacf90) is only aligned to a 16-byte boundary, not 32.
This will cause issues particularly on ARM, where natural alignment is
required (i.e. 32-byte load/stores must be on 32-byte boundaries). By
contrast, the destination is already aligned to a 128-byte boundary.
So fixing the caller, rather than Mesa, should take care of the
problem.

thanks for the reply and the observation. I aligned source on 32-byte boundary (or even 128-byte boundary) but there was no difference.
By the way, I am only using x86_64, not ARM. I believe intel sse2 only requires 16-byte boundary alignment, but perhaps i am missing something.

Is this code path in st_TexSubImage using PBOs? I guess it depends on driver (radeon in my case) implementation?

Related: pboUnpack http://www.songho.ca/opengl/files/pboUnpack.zip
gives: Transfer Rate: 236.5 MB/s. (59.1 FPS)
Does this sounds reasonably ok for uploading with PBO?

Same bottleneck __memcpy_sse2_unaligned is observed.
sample perf report output:

 28,20%  pboUnpack  libc-2.19.so            [.] __memcpy_sse2_unaligned
 16,63%  pboUnpack  pboUnpack               [.] 0x0000000000006542
  6,96%  pboUnpack  [kernel.kallsyms]       [k] clear_page_c_e
  2,52%  pboUnpack  [drm]                   [k] drm_mm_insert_node_in_range_generic
  2,10%  pboUnpack  [kernel.kallsyms]       [k] get_page_from_freelist


backtrace:

__memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:86
86    ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S: No such file or directory.
(gdb) bt
#0  __memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:86
#1  0x00007ffff2bddbbd in memcpy (__len=4194304, __src=<optimized out>, __dest=<optimized out>) at /usr/include/x86_64-linux-gnu/bits/string3.h:51
#2  memcpy_texture (dimensions=dimensions@entry=2, dstFormat=dstFormat@entry=MESA_FORMAT_B8G8R8A8_UNORM, dstRowStride=dstRowStride@entry=4096, dstSlices=dstSlices@entry=0x7fffffffd6e8,
    srcWidth=srcWidth@entry=1024, srcHeight=srcHeight@entry=1024, srcDepth=srcDepth@entry=1, srcFormat=srcFormat@entry=32993, srcType=srcType@entry=5121, srcAddr=srcAddr@entry=0x7fffeeecd000,
    srcPacking=srcPacking@entry=0x7ffff7f69180, ctx=<optimized out>) at ../../../../src/mesa/main/texstore.c:949
#3  0x00007ffff2be353d in _mesa_texstore_memcpy (srcPacking=0x7ffff7f69180, srcAddr=<optimized out>, srcType=5121, srcFormat=32993, srcDepth=<optimized out>, srcHeight=<optimized out>,
    srcWidth=<optimized out>, dstSlices=<optimized out>, dstRowStride=<optimized out>, dstFormat=MESA_FORMAT_B8G8R8A8_UNORM, baseInternalFormat=6408, dims=<optimized out>, ctx=0x7ffff7f4d010)
    at ../../../../src/mesa/main/texstore.c:3938
#4  _mesa_texstore (ctx=0x7ffff7f4d010, dims=2, baseInternalFormat=6408, dstFormat=MESA_FORMAT_B8G8R8A8_UNORM, dstRowStride=4096, dstSlices=0x7fffffffd6e8, srcWidth=1024, srcHeight=1024, srcDepth=1,
    srcFormat=32993, srcType=5121, srcAddr=0x7fffeeecd000, srcPacking=0x7ffff7f69180) at ../../../../src/mesa/main/texstore.c:3958
#5  0x00007ffff2be3812 in store_texsubimage (ctx=ctx@entry=0x7ffff7f4d010, texImage=texImage@entry=0x7c8690, xoffset=xoffset@entry=0, yoffset=yoffset@entry=0, zoffset=zoffset@entry=0, width=1024,
    height=1024, depth=1, format=32993, type=5121, pixels=0x0, packing=0x7ffff7f69180, caller=0x7ffff2d609c7 "glTexSubImage") at ../../../../src/mesa/main/texstore.c:4107
#6  0x00007ffff2be3aa5 in _mesa_store_texsubimage (ctx=ctx@entry=0x7ffff7f4d010, dims=<optimized out>, texImage=texImage@entry=0x7c8690, xoffset=xoffset@entry=0, yoffset=yoffset@entry=0,
    zoffset=zoffset@entry=0, width=<optimized out>, width@entry=1024, height=<optimized out>, height@entry=1024, depth=<optimized out>, depth@entry=1, format=<optimized out>, format@entry=32993,
    type=<optimized out>, type@entry=5121, pixels=<optimized out>, pixels@entry=0x0, packing=<optimized out>, packing@entry=0x7ffff7f69180) at ../../../../src/mesa/main/texstore.c:4171
#7  0x00007ffff2c3acaa in st_TexSubImage (ctx=0x7ffff7f4d010, dims=<optimized out>, texImage=0x7c8690, xoffset=0, yoffset=0, zoffset=0, width=1024, height=1024, depth=1, format=32993, type=5121,
    pixels=0x0, unpack=0x7ffff7f69180) at ../../../../src/mesa/state_tracker/st_cb_texture.c:787
#8  0x00007ffff2bce83d in texsubimage (ctx=0x7ffff7f4d010, dims=dims@entry=2, target=3553, level=0, xoffset=0, yoffset=0, zoffset=zoffset@entry=0, width=1024, height=1024, depth=depth@entry=1,
    format=format@entry=32993, type=type@entry=5121, pixels=pixels@entry=0x0) at ../../../../src/mesa/main/teximage.c:3445
#9  0x00007ffff2bd259c in _mesa_TexSubImage2D (target=<optimized out>, level=<optimized out>, xoffset=<optimized out>, yoffset=<optimized out>, width=<optimized out>, height=<optimized out>,
    format=32993, type=5121, pixels=0x0) at ../../../../src/mesa/main/teximage.c:3483


pixels pointer in st_texSubImage is 0x0 here, maybe because it's an internal pbo to texture transfer?
srcAddr in memcpy_texture() is 0x7fffeeecd000 which looks sufficiently aligned, but maybe this is not the correct pointer to look at.

could there also be a CPU stall/sync issue when mapping a pbo buffer?

Similar pbounpack/memcpy performance discussed a bit here recently with no conclusion: http://people.freedesktop.org/~cbrill/dri-log/?channel=dri-devel&date=2015-01-01

thanks,

- Vasilis


 

Cheers,
Daniel

_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/dri-devel

[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux