Re: [PATCH v7 07/16] rust: add `io::{Io, IoRaw}` base types

Alistair Popple <apopple@xxxxxxxxxx> · Fri, 28 Feb 2025 16:29:04 +1100

On Thu, Feb 27, 2025 at 11:01:55AM +0100, Danilo Krummrich wrote:
> On Thu, Feb 27, 2025 at 11:25:55AM +1100, Alistair Popple wrote:
> > On Tue, Feb 25, 2025 at 12:04:35PM +0100, Danilo Krummrich wrote:
> > > On Tue, Feb 25, 2025 at 04:50:05PM +1100, Alistair Popple wrote:
> > 
> > > I think build_assert() is not widely used yet and, until the situation improves,
> > > we could also keep a list of common pitfalls if that helps?
> > 
> > I've asked a few times, but are there any plans/ideas on how to improve the
> > situation?
> 
> I just proposed a few ones. If the limitation can be resolved I don't know.

Thanks. The limitation is what I was asking about. The work around for (2)
works, but is not terribly discoverable. I will see if I can come up with
something better to help with discoverability that at least.

> There are two different cases.
> 
> 	(1) build_assert() is evaluated in const context to false
> 	(2) the compiler can't guarantee that build_assert() is evaluated to
> 	    true at compile time
> 
> For (1) you get a proper backtrace by the compiler. For (2) there's currently
> only the option to make the linker fail, which doesn't produce the most useful
> output.
> 
> If we wouldn't do (2) we'd cause a kernel panic on runtime, which can be
> enforced with CONFIG_RUST_BUILD_ASSERT_ALLOW=y.
> 
> > I'm kind of suprised we're building things on top of a fairly broken
> > feature without an idea of how we might make that feature work. I'd love to
> > help, but being new to R4L no immediately useful ideas come to mind.
> 
> The feature is not broken at all, it works perfectly fine. It's just that for
> (2) it has an ergonomic limitation.

I'm not sure I agree it works perfectly fine. Developer ergonomics are
an important aspect of any build environment, and I'd argue the ergonomic
limitation for (2) means it is at least somewhat broken and needs fixing.

> > > > Unless the code absolutely cannot compile without them I think it would be
> > > > better to turn them into runtime errors that can at least hint at what might
> > > > have gone wrong. For example I think a run-time check would have been much more
> > > > appropriate and easy to debug here, rather than having to bisect my changes.
> > > 
> > > No, especially for I/O the whole purpose of the non-try APIs is to ensure that
> > > boundary checks happen at compile time.
> > 
> > To be honest I don't really understand the utility here because the compile-time
> > check can't be a definitive check. You're always going to have to fallback to
> > a run-time check because at least for PCI (and likely others) you can't know
> > for at compile time if the IO region is big enough or matches the compile-time
> > constraint.
> 
> That's not true, let me explain.
> 
> When you write a driver, you absolutely have to know the register layout. This
> means that you also know what the minimum PCI bar size has to be for your driver
> to work. If it would be smaller than what your driver expects, it can't function
> anyways. In Rust we make use of this fact.
> 
> When you map  a PCI bar through `pdev.iomap_region_sized` you pass in a const
> generic (`SIZE`) representing the *expected* PCI bar size. This can indeed fail
> on run-time, but that's fine, as mentioned, if the bar is smaller than what your
> driver expect, it's useless anyways.
> 
> If the call succeeds, it means that the actual PCI bar size is greater or equal
> to `SIZE`. Since `SIZE` is known at compile time all subsequent I/O operations
> can be boundary checked against `SIZE` at compile time, which additionally makes
> the call infallible. This works for most I/O operations drivers do.

Argh! That's the piece I was missing - that this makes the IO call infallible
and thus removes the need to write run-time error handling code. Sadly of course
that's not actually true, because I/O operations can always fail for reasons
other than what can be checked at compile time (eg. in particular PCI devices
can fall off the bus and return all 0xF's). But I guess existing drivers don't
really handle those cases either.

Anyway thanks for your time and detailed explainations, I really just started
this thread as I think reducing friction for existing kernel developers to start
looking at Rust in the kernel is important. So I wanted to highlight that the
build_assert as linker error is really confusing when coming from C, and that
it's an area that I think needs to improve to make Rust more successful in the
kernel. Sadly I don't have any immediately ideas but if I do I will post them.

> However, sometimes we need to do I/O ops at a PCI bar offset that is only known
> at run-time. In this case you can use the `try_*` variants, such as
> `try_read32()`. Those do boundary checks against the actual size of the PCI bar,
> which is only known at run-time and hence they're fallible.
> 
> > 
> > So this seems more like a quiz for developers to check if they really do want
> > to access the given offset. It's not really doing any useful compile-time bounds
> > check that is preventing something bad from happening, becasue that has to
> > happen at run-time. Especially as the whole BAR is mapped anyway.
> 
> See the explanation above.
>
> > 
> > Hence why I think an obvious run-time error instead of an obtuse and difficult
> > to figure out build error would be better. But maybe I'm missing some usecase
> > here that makes this more useful.
> 
> No, failing the boundary check at compile time (if possible) is always better
> than failing it at run-time for obvious reasons.
>
> > 
> > > > I was hoping I could suggest CONFIG_RUST_BUILD_ASSERT_ALLOW be made default yes,
> > > > but testing with that also didn't yeild great results - it creates a backtrace
> > > > but that doesn't seem to point anywhere terribly close to where the bad access
> > > > was, I'm guessing maybe due to inlining and other optimisations - or is
> > > > decode_stacktrace.sh not the right tool for this job?
> > > 
> > > I was about to suggest CONFIG_RUST_BUILD_ASSERT_ALLOW=y to you, since this will
> > > make the kernel panic when hitting a build_assert().
> > > 
> > > I gave this a quick try with [1] in qemu and it lead to the following hint,
> > > right before the oops:
> > > 
> > > [    0.957932] rust_kernel: panicked at /home/danilo/projects/linux/nova/nova-next/rust/kernel/io.rs:216:9:
> > > 
> > > Seeing this immediately tells me that I'm trying to do out of bound I/O accesses
> > > in my driver, which indeed doesn't tell me the exact line (in case things are
> > > inlined too much to gather it from the backtrace of the oops), but it should be
> > > good enough, no?
> > 
> > *smacks forehead*
> > 
> > Yes. So to answer this question:
> > 
> > > or is decode_stacktrace.sh not the right tool for this job?
> > 
> > No, it isn't. Just reading the kernel logs properly would have been a better
> > option! I guess coming from C I'm just too used to jumping straight to the stack
> > trace in the case of BUG_ON(), etc. Thanks for point that out.
> 
> Happy I could help.