On Wed, 28 Aug 2024 18:37:41 +0100 Mihail Atanassov <mihail.atanassov@xxxxxxx> wrote: > On 28/08/2024 18:27, Boris Brezillon wrote: > > On Wed, 28 Aug 2024 18:07:03 +0200 > > Boris Brezillon <boris.brezillon@xxxxxxxxxxxxx> wrote: > > > >> On Wed, 28 Aug 2024 14:22:51 +0100 > >> Mihail Atanassov <mihail.atanassov@xxxxxxx> wrote: > >> > >>> Hi Boris, > >>> > >>> On 28/08/2024 13:09, Boris Brezillon wrote: > >>>> Hi Mihail, > >>>> > >>>> On Thu, 8 Aug 2024 12:41:05 +0300 > >>>> Mihail Atanassov <mihail.atanassov@xxxxxxx> wrote: > >>>> > >>>>>> > >>>>>> +/** + * struct drm_panthor_timestamp_info - Timestamp information + > >>>>>> * + * Structure grouping all queryable information relating to the > >>>>>> GPU timestamp. + */ +struct drm_panthor_timestamp_info { + /** > >>>>>> @timestamp_frequency: The frequency of the timestamp timer. */ + > >>>>>> __u64 timestamp_frequency; + + /** @current_timestamp: The current > >>>>>> timestamp. */ + __u64 current_timestamp; > >>>>> > >>>>> As it stands, this query has nothing to do with the actual GPU so > >>>>> doesn't really belong here. > >>>>> > >>>>> It'd be more valuable, and can maybe give better calibration results > >>>>> than querying the system timestamp separately in userspace, if you > >>>>> reported all of: > >>>>> * the system timer value > >>>>> * the system timer frequency > >>>>> * the GPU timer value > >>>>> * the GPU timer frequency (because it _could_ be different in some > >>>>> systems) > >>>> > >>>> Duh, I wish this wasn't the case and all SoC vendors went for the > >>>> arch-timer which guarantees the consistency of the timestamp on the GPU > >>>> and CPU. But let's say this is a case we need to support, wouldn't it > >>>> be more useful to do the CPU/GPU calibration kernel side (basically at > >>>> init/resume time) and then expose the formula describing the > >>>> relationship between those 2 things: > >>>> > >>>> CPU_time = GPU_time * GPU_to_CPU_mul / GPU_to_CPU_div + > >>>> GPU_to_CPU_offset; > >>>> > >>> > >>> TIMESTAMP_OFFSET should indeed be set by the kernel (on resume). But I > >>> don't think we need to post M/D+offset to userspace. The 2 Frequencies + > >>> the scalar offset are the raw sources, and userspace can work back from > >>> there. > >> > >> Sure. No matter how you express the relationship, my point was, if the > >> calibration is supposed to happen in the kernel at resume time, > >> returning both the CPU/GPU time in DEV_QUERY_TIMESTAMP to make sure the > >> sampling is close enough that they actually represent the same > >> timestamp might not be needed, because you can easily convert from one > >> domain to the other. > > > > I think it makes more sense after reading [1] :-). This being said, the > > maxDeviation is here to account for any latency that might exists > > between each domain sampling, so I'd be tempted to read the CPU > > monotonic time through the regular syscalls rather than add it to the > > DEV_QUERY_TIMESTAMP ioctl. > > > > Wouldn't that defeat the point of getting low-latency consecutive reads > of both time domains? If you leave it to a separate syscall, you're at > the mercy of a lot of things, so it's not just a scalar time delta, > you'll get much higher measurement variance. I guess you're not calling clock_gettime() and instead do the cycles -> nanoseconds conversion manually for VK_TIME_DOMAIN_CLOCK_MONOTONIC_RAW, but it's unclear how you do that for VK_TIME_DOMAIN_CLOCK_MONOTONIC. > Doing it in-kernel with no > sleeps in the middle gets you better confidence in your samples being > consistently correlated in time. If you have that in-kernel low latency > correlation pairwise for all time domains you're interested in (in this > case CPU & GPU timestamps, but you could get CPU & display IP > timestamps, etc), you can then correlate all of the clocks in userspace. > > Fundamentally, though, if you don't report CPU timestamps in v1 of the > ioctl, you can extend later if it becomes clear that the maxDeviation is > not low enough with the samples being across a syscall. Yeah, I think I'd prefer this option. Note that all other drivers in mesa do it one syscall at a time (see vk_clock_gettime() users [1]). Not saying this is necessarily the best option, but the ioctl() overhead doesn't seem to cause issues there. [1]https://elixir.bootlin.com/mesa/mesa-24.2.1/A/ident/vk_clock_gettime