Re: [PATCH v7 07/16] rust: add `io::{Io, IoRaw}` base types

Danilo Krummrich <dakr@xxxxxxxxxx> · Thu, 27 Feb 2025 11:01:55 +0100

On Thu, Feb 27, 2025 at 11:25:55AM +1100, Alistair Popple wrote:
> On Tue, Feb 25, 2025 at 12:04:35PM +0100, Danilo Krummrich wrote:
> > On Tue, Feb 25, 2025 at 04:50:05PM +1100, Alistair Popple wrote:
> 
> > I think build_assert() is not widely used yet and, until the situation improves,
> > we could also keep a list of common pitfalls if that helps?
> 
> I've asked a few times, but are there any plans/ideas on how to improve the
> situation?

I just proposed a few ones. If the limitation can be resolved I don't know.

There are two different cases.

	(1) build_assert() is evaluated in const context to false
	(2) the compiler can't guarantee that build_assert() is evaluated to
	    true at compile time

For (1) you get a proper backtrace by the compiler. For (2) there's currently
only the option to make the linker fail, which doesn't produce the most useful
output.

If we wouldn't do (2) we'd cause a kernel panic on runtime, which can be
enforced with CONFIG_RUST_BUILD_ASSERT_ALLOW=y.

> I'm kind of suprised we're building things on top of a fairly broken
> feature without an idea of how we might make that feature work. I'd love to
> help, but being new to R4L no immediately useful ideas come to mind.

The feature is not broken at all, it works perfectly fine. It's just that for
(2) it has an ergonomic limitation.

> > > Unless the code absolutely cannot compile without them I think it would be
> > > better to turn them into runtime errors that can at least hint at what might
> > > have gone wrong. For example I think a run-time check would have been much more
> > > appropriate and easy to debug here, rather than having to bisect my changes.
> > 
> > No, especially for I/O the whole purpose of the non-try APIs is to ensure that
> > boundary checks happen at compile time.
> 
> To be honest I don't really understand the utility here because the compile-time
> check can't be a definitive check. You're always going to have to fallback to
> a run-time check because at least for PCI (and likely others) you can't know
> for at compile time if the IO region is big enough or matches the compile-time
> constraint.

That's not true, let me explain.

When you write a driver, you absolutely have to know the register layout. This
means that you also know what the minimum PCI bar size has to be for your driver
to work. If it would be smaller than what your driver expects, it can't function
anyways. In Rust we make use of this fact.

When you map  a PCI bar through `pdev.iomap_region_sized` you pass in a const
generic (`SIZE`) representing the *expected* PCI bar size. This can indeed fail
on run-time, but that's fine, as mentioned, if the bar is smaller than what your
driver expect, it's useless anyways.

If the call succeeds, it means that the actual PCI bar size is greater or equal
to `SIZE`. Since `SIZE` is known at compile time all subsequent I/O operations
can be boundary checked against `SIZE` at compile time, which additionally makes
the call infallible. This works for most I/O operations drivers do.

However, sometimes we need to do I/O ops at a PCI bar offset that is only known
at run-time. In this case you can use the `try_*` variants, such as
`try_read32()`. Those do boundary checks against the actual size of the PCI bar,
which is only known at run-time and hence they're fallible.

> 
> So this seems more like a quiz for developers to check if they really do want
> to access the given offset. It's not really doing any useful compile-time bounds
> check that is preventing something bad from happening, becasue that has to
> happen at run-time. Especially as the whole BAR is mapped anyway.

See the explanation above.

> 
> Hence why I think an obvious run-time error instead of an obtuse and difficult
> to figure out build error would be better. But maybe I'm missing some usecase
> here that makes this more useful.

No, failing the boundary check at compile time (if possible) is always better
than failing it at run-time for obvious reasons.

> 
> > > I was hoping I could suggest CONFIG_RUST_BUILD_ASSERT_ALLOW be made default yes,
> > > but testing with that also didn't yeild great results - it creates a backtrace
> > > but that doesn't seem to point anywhere terribly close to where the bad access
> > > was, I'm guessing maybe due to inlining and other optimisations - or is
> > > decode_stacktrace.sh not the right tool for this job?
> > 
> > I was about to suggest CONFIG_RUST_BUILD_ASSERT_ALLOW=y to you, since this will
> > make the kernel panic when hitting a build_assert().
> > 
> > I gave this a quick try with [1] in qemu and it lead to the following hint,
> > right before the oops:
> > 
> > [    0.957932] rust_kernel: panicked at /home/danilo/projects/linux/nova/nova-next/rust/kernel/io.rs:216:9:
> > 
> > Seeing this immediately tells me that I'm trying to do out of bound I/O accesses
> > in my driver, which indeed doesn't tell me the exact line (in case things are
> > inlined too much to gather it from the backtrace of the oops), but it should be
> > good enough, no?
> 
> *smacks forehead*
> 
> Yes. So to answer this question:
> 
> > or is decode_stacktrace.sh not the right tool for this job?
> 
> No, it isn't. Just reading the kernel logs properly would have been a better
> option! I guess coming from C I'm just too used to jumping straight to the stack
> trace in the case of BUG_ON(), etc. Thanks for point that out.

Happy I could help.