Re: Reviving Nyan support

Thierry Reding <thierry.reding@xxxxxxxxx> · Thu, 5 Dec 2024 12:16:57 +0100

On Wed, Dec 04, 2024 at 08:30:45PM +0100, Michał Pecio wrote:
> Hi,
> 
> > > The kernel came up and userspace got to the login prompt, but then
> > > some issues appeared:  
> > 
> > Okay, that's pretty good given that we haven't had testers for a few
> > years now.
> 
> I had some hopes because I saw that NVIDIA still tests new kernel
> releases on T124 Jetson, although I don't know how much is tested.
> The kernel seems to be in pretty good shape, save for those issues.
> 
> I found that graphics are a bigger problem: X is dog slow, and I get
> black glxgears window and some 150fps, which doesn't look great.
> Do you know if it's supposed to be like that or if it's a regression
> or some screwup on my side? I have enabled Nouveau in the kernel and
> installed X11 Nouveau driver and xorg.log shows that it loads.

That's kind of expected. At one point this was working mostly reliably
but given that there's been very limited testing, it's not a surprise
that this is somewhat defunct.

One thing to know, though, is that the Nouveau X11 driver won't do you
any good. The way that it works on Tegra is that the GPU is used only to
render to offscreen surfaces and the Tegra DRM/KMS driver will then scan
those buffers out to your display (via X's modesetting driver). In order
for this to work you need to enable the Nouveau and Tegra drivers in
Mesa and then they should be able to work together if you've got an up-
to-date X server. You might also need a bit of luck.

> Are there any other options besides Nouveau? Perhaps some newer L4T
> release which would work with mainline host1x driver and Kepler? I
> suppose anything that works on Jetson will work on Nyan too.

Unfortunately L4T doesn't work with any of this. Both the display and
GPU drivers are completely different things in L4T vs. upstream Linux. I
suppose you could try and port those drivers to a more recent kernel,
but I can't recommend it.

> Currently this machine runs Ubuntu 14.04 L4T and 3.10 CrOS kernel,
> so practically anything would be an improvement :)

The plan was to get upstream mostly to feature parity with L4T so that
it could serve as a long-term option for people after official software
support was dropped. We managed to do a bunch of things, but there are
certain aspects that aren't quite as polished as I wanted. Hardware
accelerated multimedia is among those. You can probably get some of it
working using the V4L2 Tegra VDE driver. GPU acceleration is another,
somewhat unfinished bit. Again, an up-to-date Mesa and X server should
get you there most of the way, but it's not been tested for a while, so
may have regressed.

> >> SPI
> >
> > I'm not sure what the right way is to fix this. The values in DT are
> > clearly required to be nanoseconds, so either the driver needs to
> > learn about those or the core would need to convert somehow. The core
> > doesn't know about what the driver supports, so it can't do a really
> > good job. Maybe a good compromise would be to have the core expose a
> > helper that can convert to clock cycles (the reverse is already done
> > in spi_delay_to_ns()), which drivers can then use if they only support
> > clock cycles.
> 
> I see that a few other drivers which bother to implement this callback
> convert ns to clocks, so options would be to copy-paste their code or
> put that stuff in SPI core. I have never looked at SPI before...

Yeah, that sounds about right.

> > I think LPAE cannot be enabled by default because it would break on
> > Tegra20 and Tegra30 which both don't support LPAE.
> 
> Fair enough, it's not a huge deal.

Yeah, it's a bit unfortunate, but you'll probably want some custom
kernel image anyway for this particular use-case. Things are much more
standardized on 64-bit ARM nowadays, but 32-bit ARM had some wild west,
shoot-from-the-hip vibes. =)

> > > 3. Some more warnings about bypassed regulators and missing touchpad
> > > supply (but the touchpad is enabled and works, per evtest at
> > > least).  
> > 
> > Not sure how much can be done about this. Unless you can find the
> > schematics we'd probably have to do this on a best effort basis.
> 
> I actually have the schematic from some shady laptop repair website.
> IIRC the touchpad runs from some major 3.3V rail which is always on,
> so I didn't bother fixing this yet.

Yeah, if it's an always-on regulator you can usually ignore those
warnings. It's always good to describe them and that'll get rid of the
warnings, but it shouldn't be necessary.

> I also learned something new, that platform drivers can ask for their
> probe to be deferred, which was responsible for some other warnings.
> At this point I'm not sure if anything serious remains, but regulators
> are another subsystem I know practically nothing about.

What kinds of warning were related to deferred probe? Normally the
related messages are debug level, so unless you've enabled those (which
is probably a good idea for what you're doing) you shouldn't be seeing
those.

> > My first step when debugging suspend/resume issues is usually to pass
> > no_console_suspend on the kernel command-line. That's really only
> > useful for debugging consoles and it probably doesn't work well if
> > you've only got the framebuffer console.
> 
> I made zero progress on this, and frankly didn't even try. Serial ports
> are only accessible by soldering to the board. I suppose I could try a
> USB dongle, but it will go dark as soon as xhci is suspended.

You could always try to see if you can prevent XHCI from being
suspended. Not sure if there's a standard way to do it, but worst case
you could try commenting out the code that does it, see if that gets you
anywhere. It'll probably still break at some point when interrupts get
disabled and such. Or it may break earlier since the USB subsystem is
probably not designed to stay up until that late.

Again, it might be better to check with a developer friendly device what
the status is with suspend/resume on Tegra124 in general. I think the
tests that we run periodically would've flagged any generic suspend and
resume issue, so it might be something specific to Nyan (possibly
display?). Have you tried poking the device in different ways after the
resume? Does it react at all? Does a network ping perhaps work? I could
also be that the system wake isn't properly hooked up or something. One
thing worth trying would be to use the RTC to wake the system up from
suspend. I think that's what we use in the daily testing. rtcwake is the
tool that you want to look into for that.

> > > 5. USB is power-cycled on boot, which is a bit annoying as I'm
> > > booting from a USB connected disk. IIRC CrOS kernel 3.10 wasn't
> > > doing it. Any suggestions where to look?  
> > 
> > Is this really the power going away and coming back up? In that case
> > it might be a regulator that's being temporarily disabled during boot
> > and then brought back up.
> 
> Yep, this exactly. I have fixed it already.

Excellent!

Thierry
Attachment:
signature.asc

Description: PGP signature