On Tue, Mar 23, 2021 at 08:32:50PM +0300, Dmitry Osipenko wrote: > 23.03.2021 19:44, Thierry Reding пишет: > > On Tue, Mar 23, 2021 at 05:00:30PM +0300, Dmitry Osipenko wrote: > >> 23.03.2021 15:30, Thierry Reding пишет: > >>> On Thu, Jan 14, 2021 at 12:34:22PM +0200, Mikko Perttunen wrote: > >>>> On 1/14/21 10:36 AM, Dmitry Osipenko wrote: > >>>>> 13.01.2021 21:56, Mikko Perttunen пишет: > >>>>>> On 1/13/21 8:14 PM, Dmitry Osipenko wrote: > >>>>>>> 11.01.2021 16:00, Mikko Perttunen пишет: > >>>>>>>> +struct drm_tegra_submit_buf { > >>>>>>>> + /** > >>>>>>>> + * @mapping_id: [in] > >>>>>>>> + * > >>>>>>>> + * Identifier of the mapping to use in the submission. > >>>>>>>> + */ > >>>>>>>> + __u32 mapping_id; > >>>>>>> > >>>>>>> I'm now in process of trying out the UAPI using grate drivers and this > >>>>>>> becomes the first obstacle. > >>>>>>> > >>>>>>> Looks like this is not going to work well for older Tegra SoCs, in > >>>>>>> particular for T20, which has a small GART. > >>>>>>> > >>>>>>> Given that the usefulness of the partial mapping feature is very > >>>>>>> questionable until it will be proven with a real userspace, we should > >>>>>>> start with a dynamic mappings that are done at a time of job submission. > >>>>>>> > >>>>>>> DRM already should have everything necessary for creating and managing > >>>>>>> caches of mappings, grate kernel driver has been using drm_mm_scan for a > >>>>>>> long time now for that. > >>>>>>> > >>>>>>> It should be fine to support the static mapping feature, but it should > >>>>>>> be done separately with the drm_mm integration, IMO. > >>>>>>> > >>>>>>> What do think? > >>>>>>> > >>>>>> > >>>>>> Can you elaborate on the requirements to be able to use GART? Are there > >>>>>> any other reasons this would not work on older chips? > >>>>> > >>>>> We have all DRM devices in a single address space on T30+, hence having > >>>>> duplicated mappings for each device should be a bit wasteful. > >>>> > >>>> I guess this should be pretty easy to change to only keep one mapping per > >>>> GEM object. > >>> > >>> The important point here is the semantics: this IOCTL establishes a > >>> mapping for a given GEM object on a given channel. If the underlying > >>> implementation is such that the mapping doesn't fit into the GART, then > >>> that's an implementation detail that the driver needs to take care of. > >>> Similarly, if multiple devices share a single address space, that's > >>> something the driver already knows and can take advantage of by simply > >>> reusing an existing mapping if one already exists. In both cases the > >>> semantics would be correctly implemented and that's really all that > >>> matters. > >>> > >>> Overall this interface seems sound from a high-level point of view and > >>> allows these mappings to be properly created even for the cases we have > >>> where each channel may have a separate address space. It may not be the > >>> optimal interface for all use-cases or any one individual case, but the > >>> very nature of these interfaces is to abstract away certain differences > >>> in order to provide a unified interface to a common programming model. > >>> So there will always be certain tradeoffs. > >> > >> For now this IOCTL isn't useful from a userspace perspective of older > >> SoCs and I'll need to add a lot of code that won't do anything useful > >> just to conform to the specific needs of the newer SoCs. Trying to unify > >> everything into a single API doesn't sound like a good idea at this > >> point and I already suggested to Mikko to try out variant with a > >> separated per-SoC code paths in the next version, then the mappings > >> could be handled separately by the T186+ paths. > > > > I'm not sure I understand what you're saying. Obviously the underlying > > implementation of this might have to differ depending on SoC generation. > > But it sounds like you're suggesting having different UAPIs depending on > > SoC generation. > > I suggested that this IOCTL shouldn't be mandatory for older SoCs, which > we should to have anyways for preserving the older UAPI. Yes, this makes > UAPI different and I want to see how it will look like in the next > version since the current variant was sub-optimal. What exactly is sub-optimal about the current variant? And what would an alternative look like? Like what we have in the old ABI where we pass in GEM handles directly during submissions? I can see how this new variant would be a bit more work than the alternative, but even on older SoCs, wouldn't the explicit mapping be much better for performance than having to constantly remap GEM objects for every job? In general I don't think it's useful to have separate UAPIs for what's basically the same hardware. I mean from a high-level point of view what we need to do for each job remains exactly the same whether the job is executed on Tegra20 or Tegra194. We have to map a bunch of buffers so that they can be accessed by hardware and then we have a command stream that references the mappings and does something to the memory that they represent. The only thing that's different between the SoC generations is how these mappings are created. The difference between the old UABI and this is that we create mappings upfront, and I'm not sure I understand how that could be suboptimal. If anything it should increase the efficiency of job submissions by reducing per-submit overhead. It should be fairly easy to compare this in terms of performance with implicit mappings by running tests against the old UABI and the new one. If there's a significant impact we may need to take a closer look. Yes, this will require a bit of work in userspace to adapt to these changes, but those are a one-time cost, so I don't think it's wise to ignore potential performance improvements because we don't want to update userspace. In either case, I don't think we're quite done yet. There's still a bit of work left to do on the userspace side to get a couple of use-cases up with this new UABI and it's not entirely clear yet what the results will be. However, we have to move forward somehow or this will end up being yet another attempt that didn't go anywhere. We were in a similar place a few years back and I vividly remember how frustrating that was for me personally to spend all of that time working through this stuff and then seeing it all go to waste. So can we please keep going for a little longer while there's still momentum? Thierry
Attachment:
signature.asc
Description: PGP signature