On Tue, Feb 04, 2014 at 03:15:26PM +0100, Daniel Vetter wrote: > On Tue, Feb 04, 2014 at 03:12:49PM +0100, Daniel Vetter wrote: > > On Tue, Feb 04, 2014 at 01:30:19PM +0000, Chris Wilson wrote: > > > Inserting additional PTEs has no side-effect for us as the pfn are fixed > > > for the entire time the object is resident in the global GTT. The > > > downside is that we pay the entire cost of faulting the object upon the > > > first hit, for which we in return receive the benefit of removing the > > > per-page faulting overhead. > > > > > > On an Ivybridge i7-3720qm with 1600MHz DDR3, with 32 fences, > > > Upload rate for 2 linear surfaces: 8127MiB/s -> 8134MiB/s > > > Upload rate for 2 tiled surfaces: 8607MiB/s -> 8625MiB/s > > > Upload rate for 4 linear surfaces: 8127MiB/s -> 8127MiB/s > > > Upload rate for 4 tiled surfaces: 8611MiB/s -> 8602MiB/s > > > Upload rate for 8 linear surfaces: 8114MiB/s -> 8124MiB/s > > > Upload rate for 8 tiled surfaces: 8601MiB/s -> 8603MiB/s > > > Upload rate for 16 linear surfaces: 8110MiB/s -> 8123MiB/s > > > Upload rate for 16 tiled surfaces: 8595MiB/s -> 8606MiB/s > > > Upload rate for 32 linear surfaces: 8104MiB/s -> 8121MiB/s > > > Upload rate for 32 tiled surfaces: 8589MiB/s -> 8605MiB/s > > > Upload rate for 64 linear surfaces: 8107MiB/s -> 8121MiB/s > > > Upload rate for 64 tiled surfaces: 2013MiB/s -> 3017MiB/s > > > > > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > > Cc: "Goel, Akash" <akash.goel@xxxxxxxxx> > > > --- > > > > > > It survived light testing without noticable performance degradation. Can > > > anyone think of how this will impact us negatively? > > > > piglit does an awful lot of single-pixel readbacks iirc, that's about the > > only thing I could think of. Maybe we should wait until we have the > > vm_insert_pfn_frm_io_mapping to not adversely affect this. Or if the > > overhead is negligible we could move ahead right away. Nothing else really > > crosses my mind which would qualify as real-world usage. > > On that topic: What's the improvement of the optimized insert_pfn_pgprot > with the prefault patch applied when doing just single dword writes? I.e. > just to measure the the insert_pfn performance so that we have some > impressive microbenchmark numbers justifying things? I'm thinking of a 2nd > mode in your test to measure pagefaults/s. Not pagefault/s yet, but varying object/write sizes is interesting. IGT-Version: 1.5-g906b862 (x86_64) (Linux: 3.13.0+ x86_64) 4/4096: Upload rate for 2 linear surfaces: 651.042MiB/s 4/4096: Upload rate for 2 tiled surfaces: 1302.083MiB/s 4/4096: Upload rate for 4 linear surfaces: 1116.071MiB/s 4/4096: Upload rate for 4 tiled surfaces: 1736.111MiB/s 4/4096: Upload rate for 8 linear surfaces: 892.857MiB/s 4/4096: Upload rate for 8 tiled surfaces: 1420.455MiB/s 4/4096: Upload rate for 16 linear surfaces: 57.710MiB/s 4/4096: Upload rate for 16 tiled surfaces: 58.685MiB/s 4/4096: Upload rate for 32 linear surfaces: 59.018MiB/s 4/4096: Upload rate for 32 tiled surfaces: 59.780MiB/s 4/4096: Upload rate for 64 linear surfaces: 59.060MiB/s 4/4096: Upload rate for 64 tiled surfaces: 2.021MiB/s Test assertion failure function performance, file gem_fence_upload.c:108: Last errno: 0, Success Failed assertion: linear[1] > 0.75 * linear[0] Subtest 4KiB (single dword): FAIL 4096/4096: Upload rate for 2 linear surfaces: 9259.259MiB/s 4096/4096: Upload rate for 2 tiled surfaces: 9153.318MiB/s 4096/4096: Upload rate for 4 linear surfaces: 9237.875MiB/s 4096/4096: Upload rate for 4 tiled surfaces: 9190.121MiB/s 4096/4096: Upload rate for 8 linear surfaces: 9235.209MiB/s 4096/4096: Upload rate for 8 tiled surfaces: 9280.742MiB/s 4096/4096: Upload rate for 16 linear surfaces: 9300.974MiB/s 4096/4096: Upload rate for 16 tiled surfaces: 9284.782MiB/s 4096/4096: Upload rate for 32 linear surfaces: 9311.122MiB/s 4096/4096: Upload rate for 32 tiled surfaces: 9311.122MiB/s 4096/4096: Upload rate for 64 linear surfaces: 9291.184MiB/s 4096/4096: Upload rate for 64 tiled surfaces: 1685.708MiB/s Test assertion failure function performance, file gem_fence_upload.c:109: Last errno: 0, Success Failed assertion: tiled[1] > 0.75 * tiled[0] Subtest 4KiB: FAIL 4/1048576: Upload rate for 2 linear surfaces: 21.945MiB/s 4/1048576: Upload rate for 2 tiled surfaces: 411.184MiB/s 4/1048576: Upload rate for 4 linear surfaces: 24.529MiB/s 4/1048576: Upload rate for 4 tiled surfaces: 434.028MiB/s 4/1048576: Upload rate for 8 linear surfaces: 21.448MiB/s 4/1048576: Upload rate for 8 tiled surfaces: 195.313MiB/s 4/1048576: Upload rate for 16 linear surfaces: 16.644MiB/s 4/1048576: Upload rate for 16 tiled surfaces: 53.373MiB/s 4/1048576: Upload rate for 32 linear surfaces: 16.563MiB/s 4/1048576: Upload rate for 32 tiled surfaces: 55.285MiB/s 4/1048576: Upload rate for 64 linear surfaces: 15.486MiB/s 4/1048576: Upload rate for 64 tiled surfaces: 0.107MiB/s Test assertion failure function performance, file gem_fence_upload.c:108: Last errno: 0, Success Failed assertion: linear[1] > 0.75 * linear[0] Subtest 1MiB (single dword): FAIL 1048576/1048576: Upload rate for 2 linear surfaces: 8136.153MiB/s 1048576/1048576: Upload rate for 2 tiled surfaces: 8633.445MiB/s 1048576/1048576: Upload rate for 4 linear surfaces: 8128.936MiB/s 1048576/1048576: Upload rate for 4 tiled surfaces: 8614.996MiB/s 1048576/1048576: Upload rate for 8 linear surfaces: 8126.130MiB/s 1048576/1048576: Upload rate for 8 tiled surfaces: 8615.187MiB/s 1048576/1048576: Upload rate for 16 linear surfaces: 8127.811MiB/s 1048576/1048576: Upload rate for 16 tiled surfaces: 8617.108MiB/s 1048576/1048576: Upload rate for 32 linear surfaces: 8125.888MiB/s 1048576/1048576: Upload rate for 32 tiled surfaces: 8612.528MiB/s 1048576/1048576: Upload rate for 64 linear surfaces: 8128.412MiB/s 1048576/1048576: Upload rate for 64 tiled surfaces: 4522.448MiB/s Test assertion failure function performance, file gem_fence_upload.c:109: Last errno: 0, Success Failed assertion: tiled[1] > 0.75 * tiled[0] Subtest 1MiB: FAIL There's still the obvious cliff >32 fences, but also the interesting transition at 8 objects, and the odd effect of tiled vs linear. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx