Re: [RFC PATCH xserver] modesetting: re-set the crtc's mode when link-status goes BAD

Eric Anholt <eric@xxxxxxxxxx> · Mon, 13 Feb 2017 13:05:17 -0800

Martin Peres <martin.peres@xxxxxxxxxxxxxxx> writes:

> On 06/02/17 17:50, Martin Peres wrote:
>> On 03/02/17 10:04, Daniel Vetter wrote:
>>> On Fri, Feb 03, 2017 at 01:30:14AM +0200, Martin Peres wrote:
>>>> On 01/02/17 22:05, Manasi Navare wrote:
>>>>> On Wed, Feb 01, 2017 at 11:58:16AM -0800, Eric Anholt wrote:
>>>>>> Jani Nikula <jani.nikula@xxxxxxxxxxxxxxx> writes:
>>>>>>
>>>>>>> On Tue, 31 Jan 2017, Eric Anholt <eric@xxxxxxxxxx> wrote:
>>>>>>>> Martin Peres <martin.peres@xxxxxxxxxxxxxxx> writes:
>>>>>>>>
>>>>>>>>> Despite all the careful planing of the kernel, a link may become
>>>>>>>>> insufficient to handle the currently-set mode. At this point, the
>>>>>>>>> kernel should mark this particular configuration as being broken
>>>>>>>>> and potentially prune the mode before setting the offending
>>>>>>>>> connector's
>>>>>>>>> link-status to BAD and send the userspace a hotplug event. This may
>>>>>>>>> happen right after a modeset or later on.
>>>>>>>>>
>>>>>>>>> When available, we should use the link-status information to reset
>>>>>>>>> the wanted mode.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Martin Peres <martin.peres@xxxxxxxxxxxxxxx>
>>>>>>>> If I understand this right, there are two failure modes being
>>>>>>>> handled:
>>>>>>>>
>>>>>>>> 1) A mode that won't actually work because the link isn't good
>>>>>>>> enough.
>>>>>>>>
>>>>>>>> 2) A mode that should work, but link parameters were too
>>>>>>>> optimistic and
>>>>>>>> if we just ask the kernel to set the mode again it'll use more
>>>>>>>> conservative parameters that work.
>>>>>>>>
>>>>>>>> This patch seems good for 2).  For 1), the drmmode_set_mode_major is
>>>>>>>> going to set our old mode back.  Won't the modeset then fail to link
>>>>>>>> train again, and bring us back into this loop?  The only escape
>>>>>>>> that I
>>>>>>>> see would be some other userspace responding to the advertised
>>>>>>>> mode list
>>>>>>>> changing, and then asking X to modeset to something new.
>>>>>>>>
>>>>>>>> To avoid that failure busy loop, should we re-fetching modes at this
>>>>>>>> point, and only re-setting if our mode still exists?
>>>>>>> Disclaimer: I don't know anything about the internals of the
>>>>>>> modesetting
>>>>>>> driver.
>>>>>>>
>>>>>>> Perhaps we can identify the two cases now, but I'd put this more
>>>>>>> generally: if the link status has gone bad, it's an indicator to
>>>>>>> userspace that the circumstances may have changed, and userspace
>>>>>>> should
>>>>>>> query the kernel for the list of available modes again. It should no
>>>>>>> longer trust information obtained prior to getting the bad link
>>>>>>> status,
>>>>>>> including the current mode.
>>>>>>>
>>>>>>> But specifically, I think you're right, and AFAICT asking for the
>>>>>>> list
>>>>>>> of modes again is the only way for the userspace to distinguish
>>>>>>> between
>>>>>>> the two cases. I don't think there's a shortcut for deciding the
>>>>>>> current
>>>>>>> mode is still valid.
>>>>>> To avoid the busy-loop problem, I think I'd like this patch to
>>>>>> re-query
>>>>>> the kernel to ask about the current mode list, and only try to re-set
>>>>>> the mode if our mode is still there.
>>>>>>
>>>>>> If the mode isn't there, then it's up to the DE to take action in
>>>>>> response to the notification of new modes.  If you don't have a DE to
>>>>>> take appropriate action, you're kind of out of luck.
>>>>>>
>>>>>> As far as the ABI goes, this seems fine to me.  The only concern I had
>>>>>> about ABI was having to walk all the connectors on every uevent to see
>>>>>> if any had gone bad -- couldn't we have a flag of some sort about what
>>>>>> the uevent indicates?  But uevents should be super rare, so I'd say
>>>>>> the
>>>>>> kernel could go ahead with the current plan.
>>>>> Yes I agree. The kernel sets the link status BAD as soona s link
>>>>> training fails
>>>>> but does not prune the modes until a new modelist is requested by
>>>>> the userspace.
>>>>> So this patch should re query the mode list as soon as it sees the
>>>>> link status
>>>>> BAD in order for the kernel to validate the modes again based on new
>>>>> link
>>>>> paarmeters and send a new mode list back.
>>>> Seems like a bad behaviour from the kernel, isn't it? It should return
>>>> immediatly
>>>> if the mode is gonna be pruned :s
>>> The mode list pruning isn't relevant for modeesets, the kernel doesn't
>>> validate requested modes against that. The mode list is purely for
>>> userspace's information. Which means updating it only when userspace asks
>>> for it is fine.
>>
>> Hmm, ok. Fair enough!
>>
>>> I also thought some more about the loop when we're stuck on BAD, and I
>>> think it shouldn't happen: When the link goes BAD we update the link
>>> parameter values to the new limits, and the kernel will reject any mode
>>> that won't fit anymore. Of course you might be unlucky, and the new link
>>> limits are also not working, but eventually you're stuck at the "you're
>>> link is broken, sry" stage, where the kernel rejects everything :-)
>>>
>>> So I think the busy-loop has strictly a limited amount of time until it
>>> runs out of steam.
>>
>> OK, I have given it more thoughts and discussed with Ville and Chris IRL or
>> on IRC.
>>
>> My current proposal is based on the idea that the kernel should reject a
>> mode
>> it knows it cannot set. This is already largely tested in real life: Try
>> setting 4K@60Hz on an HDMI 1.x monitor, it should immediately fail on
>> prepare(). For this proposal to work, we would need to put in the ABI
>> that a
>> driver that sets the link-status to BAD should also make sure that whatever
>> the userspace does, no infinite loop should be created (by changing the
>> maximum link characteristics before setting the link-status property).
>>
>> Re-probing the list of modes and checking if the mode is still in there is
>> inherently racy, as a new screen may be plugged between the moment the list
>> is sent to the userspace and the moment when we try setting the mode. Or
>> the
>> DE may also have changed the mode in the mean time and the kernel would
>> have reduced the limits even more. So, the userspace cannot be expected to
>> always do the right thing here :s.
>>
>> From this point of view, I really do not like the idea of re-probing, since
>> it increases the delay before the DE can handle the change and it does not
>> bring any guarantee of working properly.
>>
>> Did I miss anything? Any opinions?
>
> Any comments on this, Eric? Does it sound logical to you or did I miss 
> something?
>
> The kernel patches are stuck until this patch gets in. So far, you look 
> like the only person who would be willing to review this patch (Ajax 
> acked the patch, but that's the extent he was willing to go).

I was just trying to provide review to get the kernel unstuck.  The
kernel should not be blocked until the patch gets lands (this obviously
isn't the case with ioctls, which *don't* land in userspace until kernel
does), you just need userspace published and generally agreed that the
ABI works.
Attachment:
signature.asc

Description: PGP signature
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel