Re: [PATCH 2/3] drm/i915/gt: Make the heartbeat play nice with long pre-emption timeouts

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 02/03/2022 17:55, John Harrison wrote:

I was assuming 2.5s tP is enough and basing all calculation on that. Heartbeat or timeslicing regardless. I thought we established neither of us knows how long is enough.

Are you now saying 2.5s is definitely not enough? How is that usable for a default out of the box desktop?
Show me your proof that 2.5s is enough.

7.5s is what we have been using internally for a very long time. It has approval from all relevant parties. If you wish to pick a new random number then please provide data to back it up along with buy in from all UMD teams and project management.

And upstream disabled preemption has acks from compute. "Internally" is as far away from out of the box desktop experiences we have been arguing about. In fact you have argued compute disables the hearbeat anyway.

Lets jump to the end of this reply please for actions.

And I don't have a problem with extending the last pulse. It is fundamentally correct to do regardless of the backend. I just raised the question of whether to extend all heartbeats to account for preemption (and scheduling delays). (What is the point of bumping their priority and re-scheduling if we didn't give enough time to the engine to react? So opposite of the question you raise.)
The point is that it we are giving enough time to react. Raising the priority of a pre-emption that has already been triggered will have no effect. So as long as the total time from when the pre-emption is triggered (prio becomes sufficiently high) to the point when the reset is decided is longer than the pre-emption timeout then it works. Given that, it is unnecessary to increase the intermediate periods. It has no advantage and has the disadvantage of making the total time unreasonably long.

So again, what is the point of making every period longer? What benefit does it *actually* give?

Less special casing and pointless prio bumps ahead of giving time to engine to even react. You wouldn't have to have the last pulse 2 * tP but normal tH + tP. So again, it is nicer for me to derive all heartbeat pulses from the same input parameters.

The whole "it is very long" argument is IMO moot because now proposed 7.5s preempt period is I suspect wholly impractical for desktop. Combined with the argument that real compute disables heartbeats anyway even extra so.

Fine. "tP(RCS) = 7500;" can I merge the patch now?
I could live with setting preempt timeout to 7.5s. The downside is slower time to restoring frozen desktops. Worst case today 5 * 2.5s, with changes 4 * 2.5s + 2 * 7.5s; so from 12.5s to 25s, doubling..

Actions:

1)
Get a number from compute/OpenCL people for what they say is minimum preempt timeout for default out of the box Linux desktop experience.

This does not mean them running some tests and can't be bothered to setup up the machine for the extreme use cases, but workloads average users can realistically be expected to run.

Say for instance some image manipulation software which is OpenCL accelerated or similar. How long unpreemptable sections are expected there. Or similar. I am not familiar what all OpenCL accelerated use cases there are on Linux.

And this number should be purely about minimum preempt timeout, not considering heartbeats. This is because preempt timeout may kick in sooner than stopped heartbeat if the user workload is low priority.

2)
Commit message should explain the effect on the worst case time until engine reset.

3)
OpenCL/compute should ack the change publicly as well since they acked the disabling of preemption.

4)
I really want overflows_type in the first patch.

My position is that with the above satisfied it is okay to merge.

Regards,

Tvrtko



[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux