On Wed, Dec 6, 2023 at 2:26 PM Conor Dooley <conor@xxxxxxxxxx> wrote: > > On Wed, Dec 06, 2023 at 04:52:11AM -0800, Palmer Dabbelt wrote: > > On Thu, 10 Aug 2023 02:07:10 PDT (-0700), Conor Dooley wrote: > > > On Wed, Aug 09, 2023 at 02:01:07AM -0700, Atish Kumar Patra wrote: > > > > On Tue, Aug 8, 2023 at 6:39 AM Conor Dooley <conor@xxxxxxxxxx> wrote: > > > > > > > > > > On Tue, Aug 08, 2023 at 12:54:11AM -0700, Atish Kumar Patra wrote: > > > > > > On Wed, Aug 2, 2023 at 4:14 AM Conor Dooley <conor.dooley@xxxxxxxxxxxxx> wrote: > > > > > > > > > > > > > > Add an erratum for versions [v0.8 to v1.3) of OpenSBI which fail to add > > > > > > > the "no-map" property to the reserved memory nodes for the regions it > > > > > > > has protected using PMPs. > > > > > > > > > > > > > > Our existing fix sweeping hibernation under the carpet by marking it > > > > > > > NONPORTABLE is insufficient as there are other ways to generate > > > > > > > accesses to these reserved memory regions, as Petr discovered [1] > > > > > > > while testing crash kernels & kdump. > > > > > > > > > > > > > > Intercede during the boot process when the afflicted versions of OpenSBI > > > > > > > are present & set the "no-map" property in all "mmode_resv" nodes before > > > > > > > the kernel does its reserved memory region initialisation. > > > > > > > > > > > > > > > > > > > We have different mechanisms of DT being passed to the kernel. > > > > > > > > > > > > 1. A prior stage(e.g U-Boot SPL) to M-mode runtime firmware (e.g. > > > > > > OpenSBI, rustSBI) passes the DT to M-mode runtime firmware and it > > > > > > passes to the next stage. > > > > > > In this case, M-mode runtime firmware gets a chance to update the > > > > > > no-map property in DT that the kernel can parse. > > > > > > > > > > > > 2. User loads the DT from the boot loader (e.g EDK2, U-Boot proper). > > > > > > Any DT patching done by the M-mode firmware is useless. If these DTBs > > > > > > don't have the no-map > > > > > > property, hibernation or EFI booting will have issues as well. > > > > > > > > > > > > > > > > > We are trying to solve only one part of problem #1 in this patch. > > > > > > > > > > Correct. > > > > > > > > > > If someone's second stage is also providing an incorrect devicetree > > > > > then, yeah, this approach would fall apart - but it's the firmware > > > > > provided devicetree being incorrect that I am trying to account for > > > > > here. If a person incorrectly constructed one, I am not really sure what > > > > > we can do for them, they incorrect described their hardware /shrug > > > > > My patch should of course help in some of the scenarios you mention above > > > > > if the name of the reserved memory region from OpenSBI is propagated by > > > > > the second-stage bootloader, but that is just an extension of case 1, > > > > > not case 2. > > > > > > > > > > > I > > > > > > don't think any other M-mode runtime firmware patches DT with no-map > > > > > > property as well. > > > > > > Please let me know if I am wrong about that. The problem is not > > > > > > restricted to [v0.8 to v1.3) of OpenSBI. > > > > > > > > > > It comes down to Alex's question - do we want to fix this kind of > > > > > firmware issue in the kernel? Ultimately this is a policy decision that > > > > > "somebody" has to make. Maybe the list of firmwares that need this > > > > > > > > IMO, we shouldn't as this is a slippery slope. Kernel can't fix every > > > > firmware bug by having erratas. > > > > I agree with your point below about firmware in shipping products. I > > > > am not aware of any official products shipping anything other than > > > > OpenSBI either. > > > > > > > However, I have seen users using other firmwares in their dev > > > > environment. > > > > > > If someone's already changed their boards firmware, I have less sympathy > > > for them, as they should be able to make further changes. Punters buying > > > SBCs to install Fedora or Debian w/o having to consider their firmware > > > are who I am more interested in helping. > > > > > > > IMHO, this approach sets a bad precedent for the future especially > > > > when it only solves one part of the problem. > > > > > > Yeah, I'm certainly wary of setting an unwise precedent here. > > > Inevitably we will need to have firmware-related errata and it'd be good > > > to have a policy for what is (or more importantly what isn't > > > acceptable). Certainly we have said that known-broken version of OpenSBI > > > that T-Head puts in their SDK is not supported by the mainline kernel. > > > On the latter part, I'm perfectly happy to expand the erratum to cover > > > all affected firmwares, but I wasn't even sure if my fix worked > > > properly, hence the request for testing from those who encountered the > > > problem. > > > > > > > We shouldn't hide firmware bugs in the kernel when an upgraded > > > > firmware is already available. > > > > > > Just to note, availability of an updated firmware upstream does not > > > necessarily mean that corresponding update is possible for affected > > > hardware. > > > > Yep. I think we're been in a very hobbist-centric world in RISC-V land, but > > in general trying to get people to update firmware is hard. Part of the > > whole "kernel updates don't break users" thing is what's underneath the > > kernel, it's not just a uABI thing. > > Yeah, there's certainly an attitude that I think needs to go away, that > updating firmware etc is something we can expect to be carried out on a > universal basis. Or that fixing things in the upstream version of > OpenSBI means it'll actually propagate down to system integrators. > > > > > > > This bug is well documented in various threads and fixed in the latest > > > > version of OpenSBI. > > > > I am assuming other firmwares will follow it as well. > > > > > > > > Anybody facing hibernation or efi related booting issues should just > > > > upgrade to the latest version of firmware (e.g OpenSBI v1.3) > > > > Latest version of Qemu will support(if not happened already) the > > > > latest version of OpenSBI. > > > > > > > > This issue will only manifest in kernels 6.4 or higher. Any user > > > > facing these with the latest kernel can also upgrade the firmware. > > > > Do you see any issue with that ? > > > > > > I don't think it is fair to compare the ease of upgrading the kernel > > > to that required to upgrade a boards firmware, with the latter being > > > far, far more inconvenient on pretty much all of the boards that I have. > > > > IMO we're in the same spot as every other port here, and generally they work > > around firmware bugs when they've rolled out into production somewhere that > > firmware updates aren't likely to happen quickly. I'm not sure if there's > > any sort of exact rules written down anywhere, but IMO if the bug is going > > to impact users then we should deal with it. > > > > That applies for hardware bugs, but also firmware bugs (at a certain point > > we won't be able to tell the difference). We're sort of doing this with the > > misaligned access handling, for example. > > > > > I'm perfectly happy to drop this series though, if people generally are > > > of the opinion that this sort of firmware workaround is ill-advised. > > > We are unaffected by it, so I certainly have no pressure to have > > > something working here. It's my desire not to be user-hostile that > > > motivated this patch. > > > > IIUC you guys and Reneas are the only ones who have hardware that might be > > in a spot where users aren't able to update the firmware (ie, it's out in > > production somewhere). > > I dunno if we can really keep thinking like that though. In terms of > people who have devicetrees in the kernel and stuff available in western > catalog distribution, sure. > I don't think we can assume that that covers all users though, certainly > the syntacore folks pop up every now and then, and I sure hope that > Andes etc have larger customer bases than the in-kernel users would > suggest. > > > So I'm adding Geert, though he probably saw this > > months ago... > > Prabhakar might be a good call on that front. I'm not sure if the > Renesas stuff works on affected versions of OpenSBI though, guess it > depends on the sequencing of the support for the non-coherent stuff and > when this bug was fixed. > ATM, I dont think there are any users who are using the upstream kernel + OpenSBI (apart from me and Geert!). Currently the customers are using the BSP releases. Cheers, Prabhakar > > On that note: It's been ~4 months and it look like nobody's tested anything > > (and the comments aren't really things that would preculde testing). > > Yeah, nobody seems to really have given a crap. I was hoping the > StarFive guys that actually added the support for this would be > interested in it, but alas they were not. > I don't really care all that much - the platform I support is not > affected by the problem and I just don't enable the option elsewhere. > > > So > > maybe we just pick that second patch up into for-next and see what happens? > > IIUC that will result in broken systems for users who haven't updated their > > firmware. > > > > I agree that's a user-hostile way to do things, which is generally a bad way > > to go, but if it's really true that there's no users then we're safe. > > Probably also worth calling it out on sw-dev just to be safe. > > And if there are users, the fix is actually relatively straight-forward, > just apply patch #1. > _______________________________________________ > linux-riscv mailing list > linux-riscv@xxxxxxxxxxxxxxxxxxx > http://lists.infradead.org/mailman/listinfo/linux-riscv