Re: Why the memcpy from a mapped GPU memory is so slow on Intel Bay Trail?

Eric Anholt <eric@xxxxxxxxxx> · Wed, 21 May 2014 11:25:24 -0700

三月！ <sunnymarch@xxxxxx> writes:

> Hello!  I'm developing some openCL application with Beignet in Ubuntu
> 14.04 x64 Desktop upon Bay Trail E3825.  And I found that reading data
> from GPU memory through whatever drm_intel gem_bo_map or
> drm_intel_gem_bo_get subdata cost about 0.002 ~ 0.003 second to fetch
> a 7MiB array, which is not quite satisfing.  Could anybody help solve
> this problem?

GPUs (except in the case of SNB/IVB/HSW where the CPU is coherent with
the GPU other than the GPU's L1/2 caches) are extremely slow to read
From because write-combining memory is effectively uncached performance
for reads.  You can get better streaming read performance using the
movntdqa instruction, and you can see an example of code using it in
streaming-load-memcpy.c in mesa (though it looks like that code is
missing an mfence, which iirc is required).
Attachment:
pgpneAJEcziMi.pgp

Description: PGP signature
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx