Am 13.10.2011 05:29, schrieb Ilija Hadzic: > > The following set of patches will improve the performance of > blit-copy functions for Radeon GPUs based on R600, R700, Evergreen > and NI ASICs. > > The foundation for improvement is the use of tiled mode access (which > for copying bo's can be used regardless of whether the content is > tiled or not), and segmenting the memory block being copied into > rectangles whose edge ratio is between 1:1 and 1:2. This maximizes > the number of PCIe transactions that use maximum payload size > (typically 128 bytes) and also creates a memory access pattern that > is more favorable for both VRAM and host DRAM than what's currently > in the kernel. > > To come up with the new blit-copy code, I did a lot of PCIe traffic > analysis with the bus analyzer and also had many discussions with > Alex, trying to explain what's going on (thanks to Alex for his > time). > > Below (at the end of this note) are the results of some benchmarks > that I did with various GPUs (all in the same host: Intel i7 CPU, X58 > chipset, three DRAM channels). To run the tests on your machine load > the radeon module with 'benchmark=1 pcie_gen2=1' parameters. Most > significant improvement is in the upstream (VRAM to GART) direction > because that's where the PCIe transactions were fragmented and also > where memory access pattern was such that it created a lot of > backpressure from the host. > > It is also interesting that high-end devices (e.g. Cayman) exhibit > the least improvement and were the worst to begin with. This is > because high-end devices copy more tiles in parallel which in turn > can create bank conflicts on host memory and cause the host to do > lots of bank-close/precharge/bank-open cycles. Interesting stuff! Nice results showing the low-end devices completely blowing away the high-end ones for VRAM->GTT blits :-). I guess it isn't possible to temporarily disable some RBEs or otherwise reconfigure the chip that you could get the same performance for the high-end chips? Granted the high-end chips are only much slower for VRAM->GTT according to these results but even the other way it's still ~20% or so. Anyway, can't comment much on the patches, though the idea certainly seems to make sense. Roland > As an added "bonus", I also did some code cleanup and consolidated > the repeated code into common function, so r600 and evergreen/NI > parts now share the blit-copy code. I also expanded on the benchmark > coverage, so the module now takes benckmark parameter value between 1 > and 8 and each results in running a different benchmark. > > For details, see the commit log messages and the code. I have been > running with these patches for a few months (and I kept rebasing them > to drm-core-next as the public git progressed) and I used them in a > system setup that does *many* copying of this kind (and does them > frequently); I have not seen instabilities introduced by these > patches. I also verified the correctness of the copy using test=1 > parameter for each GPU that I had and the test passed. > > I would welcome some feedback and if you run the benchmarks with the > new blit code, I would very much like to hear what kind of > improvement you are seeing. > _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel