Re: [RFC 0/4] GPU/CPU timestamps correlation for relating OA samples with system events

Lionel Landwerlin <lionel.g.landwerlin@xxxxxxxxx> · Fri, 22 Dec 2017 10:16:17 +0000



    On 22/12/17 09:30, Sagar Arun Kamble
      wrote:

    
      On 12/21/2017 6:29 PM, Lionel
        Landwerlin wrote:

      
        Some more findings I made while
          playing with this series & GPUTop.

          Turns out the 2ms drift per second is due to timecounter.
          Adding the delta this way :

          
          https://github.com/djdeath/linux/commit/7b002cb360483e331053aec0f98433a5bd5c5c3f#diff-9b74bd0cfaa90b601d80713c7bd56be4R607

          
          Eliminates the drift.
      
      I see two imp. changes 1. approximation of start time during
      init_timecounter 2. overflow handling in delta accumulation.

      With these incorporated, I guess timecounter should also work in
      same fashion.

    
    I think the arithmetic in timecounter is inherently lossy and that's
    why we're seeing a drift. Could we be using it wrong?

    
    In the patch above, I think there is still a drift because of the
    potential fractional part loss at every delta we add.

    But it should only be a fraction of a nanosecond multiplied by the
    number of reports over a period of time.

    With a report every 1us, that should still be much less than a 1ms
    of drift over 1s.

    
    We can probably do better by always computing the clock using the
    entire delta rather than the accumulated delta.

    
         Timelines of perf i915 tracepoints
          & OA reports now make a lot more sense.

          
          There is still the issue that reading the CPU clock & the
          RCS timestamp is inherently not atomic. So there is a delta
          there.

          I think we should add a new i915 perf record type to express
          the delta that we measure this way :

          
          https://github.com/djdeath/linux/commit/7b002cb360483e331053aec0f98433a5bd5c5c3f#diff-9b74bd0cfaa90b601d80713c7bd56be4R2475

          
          So that userspace knows there might be a global offset between
          the 2 times and is able to present it.

        
      agree on this. Delta ns1-ns0 can be interpreted as max drift.

      
         Measurement on my KBL system were
          in the order of a few microseconds (~30us).

          I guess we might be able to setup the correlation point better
          (masking interruption?) to reduce the delta.

        
      already using spin_lock. Do you mean NMI?

    
    I don't actually know much on this point.

    if spin_lock is the best we can do, then that's it :)

    
          Thanks,

          
          -

          Lionel

          
          On 07/12/17 00:57, Robert Bragg wrote:

        
              On Thu, Dec 7, 2017 at 12:48 AM,
                Robert Bragg <robert@xxxxxxxxxxxxx>
                wrote:

                
                         at least from what I wrote back then it
                          looks like I was seeing a drift of a few
                          milliseconds per second on SKL. I vaguely
                          recall it being much worse given the frequency
                          constants we had for Haswell.

                        
                Sorry I didn't actually re-read my own message
                  properly before referencing it :) Apparently the 2ms
                  per second drift was for Haswell, so presumably not
                  quite so bad for SKL. 

                
                - Robert

                
          _______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

        
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx