On Tue, Aug 13, 2013 at 06:11:59PM -0700, Ben Widawsky wrote: > On Tue, Aug 13, 2013 at 06:09:09PM -0700, Ben Widawsky wrote: > > From: Ben Widawsky <ben@xxxxxxxxxxxx> > > > > In order to transition more of our code over to using a VMA instead of > > an <OBJ, VM> pair - we must have the vma accessible at execbuf time. Up > > until now, we've only had a VMA when actually binding an object. > > > > The previous patch helped handle the distinction on bound vs. unbound. > > This patch will help us catch leaks, and other issues before we actually > > shuffle a bunch of stuff around. > > > > This attempts to convert all the execbuf code to speak in vmas. Since > > the execbuf code is very self contained it was a nice isolated > > conversion. > > > > The meat of the code is about turning eb_objects into eb_vma, and then > > wiring up the rest of the code to use vmas instead of obj, vm pairs. > > > > Unfortunately, to do this, we must move the exec_list link from the obj > > structure. This list is reused in the eviction code, so we must also > > modify the eviction code to make this work. > > > > WARNING: This patch makes an already hotly profiled path slower. The cost is > > unavoidable. In reply to this mail, I will attach the extra data. > > > > [snip] > > Here is the output from gem_exec_lut_handle both before and after this > patch. The results honestly don't make sense to me, but I'll set Chris > parse it before scratching my head harder. > > Before patch > ============ > relocation: buffers= 1: old= 8060 + 165.3*reloc, lut= 7816 + 164.8*reloc (ns) > relocation: buffers= 2: old= 6748 + 166.6*reloc, lut= 6952 + 165.4*reloc (ns) > relocation: buffers= 4: old= 8140 + 165.9*reloc, lut= 8216 + 165.4*reloc (ns) > relocation: buffers= 8: old= 10732 + 166.0*reloc, lut= 10615 + 165.2*reloc (ns) > relocation: buffers= 16: old= 15099 + 167.8*reloc, lut= 15337 + 165.3*reloc (ns) > relocation: buffers= 32: old= 26140 + 166.0*reloc, lut= 25488 + 165.5*reloc (ns) > relocation: buffers= 64: old= 46300 + 170.5*reloc, lut= 44279 + 166.7*reloc (ns) > relocation: buffers= 128: old= 84056 + 176.9*reloc, lut= 85379 + 166.3*reloc (ns) > relocation: buffers= 256: old= 174398 + 167.9*reloc, lut= 167744 + 167.0*reloc (ns) > relocation: buffers= 512: old= 349688 + 175.7*reloc, lut= 348590 + 170.8*reloc (ns) > relocation: buffers=1024: old= 726265 + 191.2*reloc, lut= 719774 + 180.2*reloc (ns) > relocation: buffers=2048: old=1456866 + 224.3*reloc, lut=1442087 + 173.0*reloc (ns) > skip-relocs: buffers= 1: old= 4445 + 16.0*reloc, lut= 4433 + 15.6*reloc (ns) > skip-relocs: buffers= 2: old= 4585 + 16.0*reloc, lut= 4571 + 15.6*reloc (ns) > skip-relocs: buffers= 4: old= 5667 + 16.0*reloc, lut= 5340 + 15.6*reloc (ns) > skip-relocs: buffers= 8: old= 6051 + 16.1*reloc, lut= 6026 + 15.6*reloc (ns) > skip-relocs: buffers= 16: old= 7953 + 16.1*reloc, lut= 7914 + 15.6*reloc (ns) > skip-relocs: buffers= 32: old= 11972 + 16.2*reloc, lut= 11875 + 15.7*reloc (ns) > skip-relocs: buffers= 64: old= 19999 + 16.5*reloc, lut= 19832 + 15.7*reloc (ns) > skip-relocs: buffers= 128: old= 37796 + 16.9*reloc, lut= 36539 + 15.9*reloc (ns) > skip-relocs: buffers= 256: old= 71604 + 18.1*reloc, lut= 71313 + 16.5*reloc (ns) > skip-relocs: buffers= 512: old= 152682 + 24.3*reloc, lut= 141379 + 27.9*reloc (ns) > skip-relocs: buffers=1024: old= 314116 + 41.7*reloc, lut= 303019 + 20.1*reloc (ns) > skip-relocs: buffers=2048: old= 619784 + 54.1*reloc, lut= 603931 + 20.0*reloc (ns) > no-relocs: buffers= 1: old= 4194 + 5.1*reloc, lut= 4206 + 4.8*reloc (ns) > no-relocs: buffers= 2: old= 4404 + 5.1*reloc, lut= 4381 + 4.8*reloc (ns) > no-relocs: buffers= 4: old= 4926 + 5.1*reloc, lut= 4921 + 4.8*reloc (ns) > no-relocs: buffers= 8: old= 5901 + 5.1*reloc, lut= 5822 + 4.9*reloc (ns) > no-relocs: buffers= 16: old= 7840 + 5.1*reloc, lut= 7737 + 4.9*reloc (ns) > no-relocs: buffers= 32: old= 11842 + 5.1*reloc, lut= 11681 + 4.9*reloc (ns) > no-relocs: buffers= 64: old= 19741 + 5.1*reloc, lut= 19542 + 4.8*reloc (ns) > no-relocs: buffers= 128: old= 36479 + 5.2*reloc, lut= 35958 + 4.9*reloc (ns) > no-relocs: buffers= 256: old= 70171 + 5.4*reloc, lut= 69390 + 5.2*reloc (ns) > no-relocs: buffers= 512: old= 147213 + 3.5*reloc, lut= 137953 + 13.0*reloc (ns) > no-relocs: buffers=1024: old= 300165 + 4.8*reloc, lut= 293852 + 4.9*reloc (ns) > no-relocs: buffers=2048: old= 597992 + 8.3*reloc, lut= 590185 + 2.1*reloc (ns) > > > After patch > =========== > relocation: buffers= 1: old= 8075 + 81.4*reloc, lut= 7592 + 80.6*reloc (ns) > relocation: buffers= 2: old= 5744 + 82.3*reloc, lut= 5837 + 81.1*reloc (ns) > relocation: buffers= 4: old= 4875 + 82.7*reloc, lut= 4871 + 81.6*reloc (ns) > relocation: buffers= 8: old= 5729 + 82.7*reloc, lut= 5698 + 81.5*reloc (ns) > relocation: buffers= 16: old= 7952 + 83.0*reloc, lut= 7809 + 81.9*reloc (ns) > relocation: buffers= 32: old= 11884 + 82.9*reloc, lut= 11702 + 81.6*reloc (ns) > relocation: buffers= 64: old= 20388 + 83.4*reloc, lut= 19995 + 82.2*reloc (ns) > relocation: buffers= 128: old= 38057 + 85.0*reloc, lut= 37675 + 83.4*reloc (ns) > relocation: buffers= 256: old= 74912 + 87.0*reloc, lut= 74064 + 85.4*reloc (ns) > relocation: buffers= 512: old= 161136 + 94.8*reloc, lut= 157046 + 87.5*reloc (ns) > relocation: buffers=1024: old= 349443 + 107.0*reloc, lut= 342081 + 91.2*reloc (ns) > relocation: buffers=2048: old= 707951 + 131.8*reloc, lut= 690754 + 96.9*reloc (ns) > skip-relocs: buffers= 1: old= 2966 + 16.6*reloc, lut= 2963 + 15.6*reloc (ns) > skip-relocs: buffers= 2: old= 3083 + 16.5*reloc, lut= 3056 + 15.5*reloc (ns) > skip-relocs: buffers= 4: old= 3279 + 16.6*reloc, lut= 3242 + 15.6*reloc (ns) > skip-relocs: buffers= 8: old= 3692 + 16.7*reloc, lut= 3654 + 15.6*reloc (ns) > skip-relocs: buffers= 16: old= 4522 + 16.7*reloc, lut= 4461 + 15.5*reloc (ns) > skip-relocs: buffers= 32: old= 6254 + 16.7*reloc, lut= 6138 + 15.7*reloc (ns) > skip-relocs: buffers= 64: old= 10098 + 16.8*reloc, lut= 9939 + 15.7*reloc (ns) > skip-relocs: buffers= 128: old= 17983 + 17.6*reloc, lut= 17729 + 16.3*reloc (ns) > skip-relocs: buffers= 256: old= 34388 + 18.8*reloc, lut= 33981 + 17.6*reloc (ns) > skip-relocs: buffers= 512: old= 74211 + 25.2*reloc, lut= 72185 + 18.6*reloc (ns) > skip-relocs: buffers=1024: old= 160514 + 34.1*reloc, lut= 157086 + 20.3*reloc (ns) > skip-relocs: buffers=2048: old= 323954 + 51.5*reloc, lut= 315928 + 22.5*reloc (ns) > no-relocs: buffers= 1: old= 2840 + 5.1*reloc, lut= 2834 + 4.8*reloc (ns) > no-relocs: buffers= 2: old= 2938 + 5.1*reloc, lut= 2917 + 4.8*reloc (ns) > no-relocs: buffers= 4: old= 3220 + 5.1*reloc, lut= 3201 + 4.8*reloc (ns) > no-relocs: buffers= 8: old= 3614 + 5.1*reloc, lut= 3545 + 4.8*reloc (ns) > no-relocs: buffers= 16: old= 4437 + 5.1*reloc, lut= 4368 + 4.8*reloc (ns) > no-relocs: buffers= 32: old= 6105 + 5.1*reloc, lut= 6024 + 4.9*reloc (ns) > no-relocs: buffers= 64: old= 9864 + 5.1*reloc, lut= 9652 + 4.9*reloc (ns) > no-relocs: buffers= 128: old= 17388 + 5.1*reloc, lut= 17126 + 4.9*reloc (ns) > no-relocs: buffers= 256: old= 33087 + 5.4*reloc, lut= 32668 + 5.3*reloc (ns) > no-relocs: buffers= 512: old= 71476 + 5.0*reloc, lut= 69464 + 4.9*reloc (ns) > no-relocs: buffers=1024: old= 154379 + 4.9*reloc, lut= 152796 + 4.3*reloc (ns) > no-relocs: buffers=2048: old= 309435 + 5.0*reloc, lut= 301095 + 4.9*reloc (ns) Hmm, either the patch really did make things faster or the results are being subjected to cpufreq. relocation: do the full relocation processing skip-relocs: all the relocation addresses are correct, so no rewrite no-relocs: no buffers moved, no relocation processing "Buffers" is the number of buffers passed into execbuffer, and so we measure the cost of trying to process a certain complexity of a batch. Then we do a least-squares plot through a number of batches with varying numbers of relocations to estimate the overhead of processing the execbuffer array versus the cost of each addition relocation in the batch. The first number (the x_0 intercept in nanoseconds) should be the measure of how long it takes to grab all the buffers into our local structures - this is what I was expecting to be worst hit by the vma patches. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx