Am 29.05.19 um 18:29 schrieb Emil Velikov: > On 2019/05/29, Koenig, Christian wrote: >> Am 29.05.19 um 15:03 schrieb Emil Velikov: >>> On 2019/05/29, Dave Airlie wrote: >>>> On Wed, 29 May 2019 at 02:47, Emil Velikov <emil.l.velikov@xxxxxxxxx> wrote: >>>>> On 2019/05/28, Koenig, Christian wrote: >>>>>> Am 28.05.19 um 18:10 schrieb Emil Velikov: >>>>>>> On 2019/05/28, Daniel Vetter wrote: >>>>>>>> On Tue, May 28, 2019 at 10:03 AM Koenig, Christian >>>>>>>> <Christian.Koenig@xxxxxxx> wrote: >>>>>>>>> Am 28.05.19 um 09:38 schrieb Daniel Vetter: >>>>>>>>>> [SNIP] >>>>>>>>>>> Might be a good idea looking into reverting it partially, so that at >>>>>>>>>>> least command submission and buffer allocation is still blocked. >>>>>>>>>> I thought the issue is a lot more than vainfo, it's pretty much every >>>>>>>>>> hacked up compositor under the sun getting this wrong one way or >>>>>>>>>> another. Thinking about this some more, I also have no idea how you'd >>>>>>>>>> want to deprecate rendering on primary nodes in general. Apparently >>>>>>>>>> that breaks -modesetting already, and probably lots more compositors. >>>>>>>>>> And it looks like we're finally achieve the goal kms set out to 10 >>>>>>>>>> years ago, and new compositors are sprouting up all the time. I guess >>>>>>>>>> we could just break them all (on new hardware) and tell them to all >>>>>>>>>> suck it up. But I don't think that's a great option. And just >>>>>>>>>> deprecating this on amdgpu is going to be even harder, since then >>>>>>>>>> everywhere else it'll keep working, and it's just amdgpu.ko that looks >>>>>>>>>> broken. >>>>>>>>>> >>>>>>>>>> Aside: I'm not supporting Emil's idea here because it fixes any issues >>>>>>>>>> Intel has - Intel doesn't care. I support it because reality sucks, >>>>>>>>>> people get this render vs. primary vs. multi-gpu prime wrong all the >>>>>>>>>> time (that's also why we have hardcoded display+gpu pairs in mesa for >>>>>>>>>> the various soc combinations out there), and this looks like a >>>>>>>>>> pragmatic solution. It'd be nice if every compositor and everything >>>>>>>>>> else would perfectly support multi gpu and only use render nodes for >>>>>>>>>> rendering, and only primary nodes for display. But reality is that >>>>>>>>>> people hack on stuff until gears on screen and then move on to more >>>>>>>>>> interesting things (to them). So I don't think we'll ever win this :-/ >>>>>>>>> Yeah, but this is a classic case of working around user space issues by >>>>>>>>> making kernel changes instead of fixing user space. >>>>>>>>> >>>>>>>>> Having privileged (output control) and unprivileged (rendering control) >>>>>>>>> functionality behind the same node is a mistake we have made a long time >>>>>>>>> ago and render nodes finally seemed to be a way to fix that. >>>>>>>>> >>>>>>>>> I mean why are compositors using the primary node in the first place? >>>>>>>>> Because they want to have access to privileged resources I think and in >>>>>>>>> this case it is perfectly ok to do so. >>>>>>>>> >>>>>>>>> Now extending unprivileged access to the primary node actually sounds >>>>>>>>> like a step into the wrong direction to me. >>>>>>>>> >>>>>>>>> I rather think that we should go down the route of completely dropping >>>>>>>>> command submission and buffer allocation through the primary node for >>>>>>>>> non master clients. And then as next step at some point drop support for >>>>>>>>> authentication/flink. >>>>>>>>> >>>>>>>>> I mean we have done this with UMS as well and I don't see much other way >>>>>>>>> to move forward and get rid of those ancient interface in the long term. >>>>>>>> Well kms had some really good benefits that drove quick adoption, like >>>>>>>> "suspend/resume actually has a chance of working" or "comes with >>>>>>>> buffer management so you can run multiple gears". >>>>>>>> >>>>>>>> The render node thing is a lot more niche use case (prime, better priv >>>>>>>> separation), plus "it's cleaner design". And the "cleaner design" part >>>>>>>> is something that empirically doesn't seem to matter :-/ Just two >>>>>>>> examples: >>>>>>>> - KHR_display/leases just iterated display resources on the fd needed >>>>>>>> for rendering (and iirc there was even a patch to expose that for >>>>>>>> render nodes too so it works with DRI3), because implementing >>>>>>>> protocols is too hard. Barely managed to stop that one before it >>>>>>>> happened. >>>>>>>> - Various video players use the vblank ioctl on directly to schedule >>>>>>>> frames, without telling the compositor. I discovered that when I >>>>>>>> wanted to limite the vblank ioctl to master clients only. Again, >>>>>>>> apparently too hard to use the existing extensions, or fix the bugs in >>>>>>>> there, or whatever. One userspace got fixed last year, but it'll >>>>>>>> probably get copypasted around forever :-/ >>>>>>>> >>>>>>>> So I don't think we'll ever manage to roll a clean split out, and best >>>>>>>> we can do is give in and just hand userspace what it wants. As much as >>>>>>>> that's misguided and unclean and all that. Maybe it'll result in a >>>>>>>> least fewer stuff getting run as root to hack around this, because >>>>>>>> fixing properly seems not to be on the table. >>>>>>>> >>>>>>>> The beauty of kms is that we've achieved the mission, everyone's >>>>>>>> writing their own thing. Which is also terrible, and I don't think >>>>>>>> it'll get better. >>>>>>> With the risk of coming rude I will repeat my earlier comment: >>>>>>> >>>>>>> The problem is _neither_ Intel nor libva specific. >>>>>>> >>>>>>> >>>>>>> >>>>>>> That said, let's step back for a moment and consider: >>>>>>> >>>>>>> - the "block everything but KMS via the primary node" idea is great but >>>>>>> orthogonal >>>>>>> >>>>>>> - the series does address issues that are vendor-agnostic >>>>>>> >>>>>>> - by default this series does _not_ cause any regression be that for >>>>>>> new or old userspace >>>>>>> >>>>>>> - there are two trivial solutions, if the AMD team has concerns about >>>>>>> closed-source/private stack depending on the old behaviour >>>>>>> If they want I can even write the patches ;-) >>>>>>> >>>>>>> >>>>>>> That said, the notable comments received so far are: >>>>>>> - rework patch 13/13 to remove the DRM_AUTH from prime fd to/from >>>>>>> handle. I'm OK but this will change the return code - from EACCES to >>>>>>> ENOSYS >>>>>>> >>>>>>> - vmwgfx will need a check on the reference ioctl(s) - IIRC Thomas is >>>>>>> planning to drop nearly all DRM_AUTH instances in their driver. >>>>>>> >>>>>>> >>>>>>> Christian, as mentioned before - this series does _not_ add >>>>>>> functionality to render nodes. It effectively paves a way towards >>>>>>> removing DRM_AUTH. >>>>>> But it adds functionality to the primary node. >>>>>> >>>>> Behaviour is adjusted - functionality was there since day 1. >>>>> >>>>>>> I understand the series may feel a bit dirty. Yet I would gladly address >>>>>>> any technical concerns you have. >>>>>> Well putting compatibility issues aside my concern is that this is >>>>>> simply a bad design decision which we can't revert later on. >>>>>> >>>>> As sad above - any concerns (theoretical or actual regressions) can be >>>>> trivially fixed _without_ reverting any of this. >>>>> >>>>> I am more than happy to step up and address any regressions in timely >>>>> manner. >>>>> >>>>> >>>>> As a reminder without this series, some of your customers are forced to >>>>> run their applications as root. >>>> I'm torn here on whether this is worth it. Have we got more use cases >>>> to justify it? >>>> >>> Should have mentioned: three DRM drivers (not counting i915) have >>> dropped DRM_AUTH, assumingly for the same reasons I'm bringing here. >>> >>> Apart from the libva, kmscube + gst and mesa, I'm expecting other >>> projects to make the same mistake. Since the former three define the >>> norm of using DRM. >>> >>> The "fix" for all of these being "run as root" :-\ >>> >>>> I'm wary of opening this up just because we can. >>>> >>> What can I do to alleviate that worry? I have spent over a week auditing >>> code and designed so that we can reinstate the authentication only where >>> needed. >> Well I don't think the worry here is about regressions, > Glad to hear. > >> but rather about >> a design decision we will never be able to revert. >> > Can you think of any reason/issue why we would want to revert this? I > will gladly spend some thing exploring how to address it. Well, to finally get rid of the primary node for non display hardware. And in general to have a clean separation between display and rendering. >> So the question we have to ask is rather if it's a good design decision >> to resurrect the primary node with all its related compability burdens >> to work around an issue which is essentially an userspace coding error. >> > Can see you're not happy on the topic - I'm not too excited either. The > truth to the matter is - DRM drivers have dropped DRM_AUTH regardless of > my work. Then we should probably consider stopping doing this and enforce that the primary node is not used that widely any more. Regards, Christian. > > It's very unfortunate, if AMDGPU stands out. Perhaps after some time and > unhappy users you'll reconsider. > > I believe that Linus has pointed out a number of times that kernel > developers should care about our users. Even when it's an userspace > error. > > > HTH > Emil _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx