Re: amdgpu doesn't do implicit sync, requires drivers to do it in IBs

Christian König <christian.koenig@xxxxxxx> · Thu, 28 May 2020 20:12:15 +0200



    Am 28.05.20 um 18:06 schrieb Marek
      Olšák:

    
          On Thu, May 28, 2020 at
            10:40 AM Christian König <christian.koenig@xxxxxxx>
            wrote:

          
          Am 28.05.20 um 12:06
            schrieb Michel Dänzer:

            > On 2020-05-28 11:11 a.m., Christian König wrote:

            >> Well we still need implicit sync [...]

            > Yeah, this isn't about "we don't want implicit sync",
            it's about "amdgpu

            > doesn't ensure later jobs fully see the effects of
            previous implicitly

            > synced jobs", requiring userspace to do pessimistic
            flushing.

            
            Yes, exactly that.

            
            For the background: We also do this flushing for explicit
            syncs. And 

            when this was implemented 2-3 years ago we first did the
            flushing for 

            implicit sync as well.

            
            That was immediately reverted and then implemented
            differently because 

            it caused severe performance problems in some use cases.

            
            I'm not sure of the root cause of this performance problems.
            My 

            assumption was always that we then insert to many pipeline
            syncs, but 

            Marek doesn't seem to think it could be that.

            
            On the one hand I'm rather keen to remove the extra handling
            and just 

            always use the explicit handling for everything because it
            simplifies 

            the kernel code quite a bit. On the other hand I don't want
            to run into 

            this performance problem again.

            
            Additional to that what the kernel does is a "full" pipeline
            sync, e.g. 

            we busy wait for the full hardware pipeline to drain. That
            might be 

            overkill if you just want to do some flushing so that the
            next shader 

            sees the stuff written, but I'm not an expert on that.

          
          Do we busy-wait on the CPU or in WAIT_REG_MEM?
          

          WAIT_REG_MEM is what UMDs do and should be faster.
        
      
    We use WAIT_REG_MEM to wait for an EOP fence value to reach memory.

    
    We use this for a couple of things, especially to make sure that the
    hardware is idle before changing VMID to page table associations.

    
    What about your idea of having an extra dw in the shared BOs
    indicating that they are flushed?

    
    As far as I understand it an EOS or other event might be sufficient
    for the caches as well. And you could insert the WAIT_REG_MEM
    directly before the first draw using the texture and not before the
    whole IB.

    
    Could be that we can optimize this even more than what we do in the
    kernel.

    
    Christian.

    
          Marek
        
      
_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx