On 2019-07-22 11:39 a.m., Timur Kristóf wrote: >>> >>> 1. Why is the GTT->VRAM copy so much slower than the VRAM->GTT >>> copy? >>> >>> 2. Why is the bus limited to 24 Gbit/sec? I would expect the >>> Thunderbolt port to give me at least 32 Gbit/sec for PCIe traffic. >> >> That's unrealistic I'm afraid. As I said on IRC, from the GPU POV >> there's an 8 GT/s x4 PCIe link, so ~29.8 Gbit/s (= 32 billion bit/s; >> I >> missed this nuance on IRC) is the theoretical raw bandwidth. However, >> in >> practice that's not achievable due to various overhead[0], and I'm >> only >> seeing up to ~90% utilization of the theoretical bandwidth with a >> "normal" x16 link as well. I wouldn't expect higher utilization >> without >> seeing some evidence to suggest it's possible. >> >> >> [0] According to >> https://www.tested.com/tech/457440-theoretical-vs-actual-bandwidth-pci-express-and-thunderbolt/ >> , PCIe 3.0 uses 1.54% of the raw bandwidth for its internal encoding. >> Also keep in mind all CPU<->GPU communication has to go through the >> PCIe >> link, e.g. for programming the transfers, in-band signalling from the >> GPU to the PCIe port where the data is being transferred to/from, ... > > Good point, I used 1024 and not 1000. My mistake. > > There is something else: > In the same benchmark there is a "fill->GTT ,SDMA" row which has a > 4035 MB/s number. If that traffic goes through the TB3 interface then > we just found our 32 Gbit/sec. The GPU is only connected to the host via PCIe, there's nowhere else it could go through. > Now the question is, if I understand this correctly and the SDMA can > indeed do 32 Gbit/sec for "fill->GTT", then why can't it do the same > with other kinds of transfers? Not sure if there is a good answer to > that question though. > > Also I still don't fully understand why GTT->VRAM is slower than VRAM- >> GTT, when the bandwidth is clearly available. While those are interesting questions at some level, I don't think they will get us closer to solving your problem. It comes down to identifying inefficient transfers across PCIe and optimizing them. > Side note: with regards to that 1.5% figure, the TB3 tech brief[0] > explicitly mentions this and says that it isn't carried over: "the > underlying protocol uses some data to provide encoding overhead which > is not carried over the Thunderbolt 3 link reducing the consumed > bandwidth by roughly 20 percent (DisplayPort) or 1.5 percent (PCI > Express Gen 3)" That just means the internal TB3 link only carries the payload data from the PCIe link, not the 1.5% of bits used for the PCIe encoding. TB3 cannot magically make the PCIe link itself work without the encoding. -- Earthling Michel Dänzer | https://www.amd.com Libre software enthusiast | Mesa and X developer _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel