Re: [PATCH] drm/panthor: Add DEV_QUERY_TIMESTAMP_INFO dev query

Boris Brezillon <boris.brezillon@xxxxxxxxxxxxx> · Thu, 29 Aug 2024 08:03:32 +0000

On Wed, 28 Aug 2024 18:37:41 +0100
Mihail Atanassov <mihail.atanassov@xxxxxxx> wrote:

> On 28/08/2024 18:27, Boris Brezillon wrote:
> > On Wed, 28 Aug 2024 18:07:03 +0200
> > Boris Brezillon <boris.brezillon@xxxxxxxxxxxxx> wrote:
> >   
> >> On Wed, 28 Aug 2024 14:22:51 +0100
> >> Mihail Atanassov <mihail.atanassov@xxxxxxx> wrote:
> >>  
> >>> Hi Boris,
> >>>
> >>> On 28/08/2024 13:09, Boris Brezillon wrote:  
> >>>> Hi Mihail,
> >>>>
> >>>> On Thu, 8 Aug 2024 12:41:05 +0300
> >>>> Mihail Atanassov <mihail.atanassov@xxxxxxx> wrote:
> >>>>        
> >>>>>>
> >>>>>> +/** + * struct drm_panthor_timestamp_info - Timestamp information +
> >>>>>> * + * Structure grouping all queryable information relating to the
> >>>>>> GPU timestamp. + */ +struct drm_panthor_timestamp_info { +	/**
> >>>>>> @timestamp_frequency: The frequency of the timestamp timer. */ +
> >>>>>> __u64 timestamp_frequency; + +	/** @current_timestamp: The current
> >>>>>> timestamp. */ +	__u64 current_timestamp;  
> >>>>>
> >>>>> As it stands, this query has nothing to do with the actual GPU so
> >>>>> doesn't really belong here.
> >>>>>
> >>>>> It'd be more valuable, and can maybe give better calibration results
> >>>>> than querying the system timestamp separately in userspace, if you
> >>>>> reported all of:
> >>>>>     * the system timer value
> >>>>>     * the system timer frequency
> >>>>>     * the GPU timer value
> >>>>>     * the GPU timer frequency (because it _could_ be different in some
> >>>>> systems)  
> >>>>
> >>>> Duh, I wish this wasn't the case and all SoC vendors went for the
> >>>> arch-timer which guarantees the consistency of the timestamp on the GPU
> >>>> and CPU. But let's say this is a case we need to support, wouldn't it
> >>>> be more useful to do the CPU/GPU calibration kernel side (basically at
> >>>> init/resume time) and then expose the formula describing the
> >>>> relationship between those 2 things:
> >>>>
> >>>> CPU_time = GPU_time * GPU_to_CPU_mul / GPU_to_CPU_div +
> >>>> 	   GPU_to_CPU_offset;
> >>>>        
> >>>
> >>> TIMESTAMP_OFFSET should indeed be set by the kernel (on resume). But I
> >>> don't think we need to post M/D+offset to userspace. The 2 Frequencies +
> >>> the scalar offset are the raw sources, and userspace can work back from
> >>> there.  
> >>
> >> Sure. No matter how you express the relationship, my point was, if the
> >> calibration is supposed to happen in the kernel at resume time,
> >> returning both the CPU/GPU time in DEV_QUERY_TIMESTAMP to make sure the
> >> sampling is close enough that they actually represent the same
> >> timestamp might not be needed, because you can easily convert from one
> >> domain to the other.  
> > 
> > I think it makes more sense after reading [1] :-). This being said, the
> > maxDeviation is here to account for any latency that might exists
> > between each domain sampling, so I'd be tempted to read the CPU
> > monotonic time through the regular syscalls rather than add it to the
> > DEV_QUERY_TIMESTAMP ioctl.
> >   
> 
> Wouldn't that defeat the point of getting low-latency consecutive reads 
> of both time domains? If you leave it to a separate syscall, you're at 
> the mercy of a lot of things, so it's not just a scalar time delta, 
> you'll get much higher measurement variance.

I guess you're not calling clock_gettime() and instead do the cycles ->
nanoseconds conversion manually for VK_TIME_DOMAIN_CLOCK_MONOTONIC_RAW,
but it's unclear how you do that for VK_TIME_DOMAIN_CLOCK_MONOTONIC.

> Doing it in-kernel with no 
> sleeps in the middle gets you better confidence in your samples being 
> consistently correlated in time. If you have that in-kernel low latency 
> correlation pairwise for all time domains you're interested in (in this 
> case CPU & GPU timestamps, but you could get CPU & display IP 
> timestamps, etc), you can then correlate all of the clocks in userspace.
> 
> Fundamentally, though, if you don't report CPU timestamps in v1 of the 
> ioctl, you can extend later if it becomes clear that the maxDeviation is 
> not low enough with the samples being across a syscall.

Yeah, I think I'd prefer this option. Note that all other drivers in
mesa do it one syscall at a time (see vk_clock_gettime() users [1]).
Not saying this is necessarily the best option, but the ioctl()
overhead doesn't seem to cause issues there.

[1]https://elixir.bootlin.com/mesa/mesa-24.2.1/A/ident/vk_clock_gettime