On Fri, 13 Sep 2024 14:04:06 PDT (-0700), Charlie Jenkins wrote: > On Fri, Sep 13, 2024 at 08:41:34AM +0100, Lorenzo Stoakes wrote: >> On Wed, Sep 11, 2024 at 11:18:12PM GMT, Charlie Jenkins wrote: >> > On Wed, Sep 11, 2024 at 07:21:27PM +0100, Catalin Marinas wrote: >> > > On Tue, Sep 10, 2024 at 05:45:07PM -0700, Charlie Jenkins wrote: >> > > > On Tue, Sep 10, 2024 at 03:08:14PM -0400, Liam R. Howlett wrote: >> > > > > * Catalin Marinas <catalin.marinas@xxxxxxx> [240906 07:44]: >> > > > > > On Fri, Sep 06, 2024 at 09:55:42AM +0000, Arnd Bergmann wrote: >> > > > > > > On Fri, Sep 6, 2024, at 09:14, Guo Ren wrote: >> > > > > > > > On Fri, Sep 6, 2024 at 3:18â?¯PM Arnd Bergmann <arnd@xxxxxxxx> wrote: >> > > > > > > >> It's also unclear to me how we want this flag to interact with >> > > > > > > >> the existing logic in arch_get_mmap_end(), which attempts to >> > > > > > > >> limit the default mapping to a 47-bit address space already. >> > > > > > > > >> > > > > > > > To optimize RISC-V progress, I recommend: >> > > > > > > > >> > > > > > > > Step 1: Approve the patch. >> > > > > > > > Step 2: Update Go and OpenJDK's RISC-V backend to utilize it. >> > > > > > > > Step 3: Wait approximately several iterations for Go & OpenJDK >> > > > > > > > Step 4: Remove the 47-bit constraint in arch_get_mmap_end() >> > > >> > > Point 4 is an ABI change. What guarantees that there isn't still >> > > software out there that relies on the old behaviour? >> > >> > Yeah I don't think it would be desirable to remove the 47 bit >> > constraint in architectures that already have it. >> > >> > > >> > > > > > > I really want to first see a plausible explanation about why >> > > > > > > RISC-V can't just implement this using a 47-bit DEFAULT_MAP_WINDOW >> > > > > > > like all the other major architectures (x86, arm64, powerpc64), >> > > > > > >> > > > > > FWIW arm64 actually limits DEFAULT_MAP_WINDOW to 48-bit in the default >> > > > > > configuration. We end up with a 47-bit with 16K pages but for a >> > > > > > different reason that has to do with LPA2 support (I doubt we need this >> > > > > > for the user mapping but we need to untangle some of the macros there; >> > > > > > that's for a separate discussion). >> > > > > > >> > > > > > That said, we haven't encountered any user space problems with a 48-bit >> > > > > > DEFAULT_MAP_WINDOW. So I also think RISC-V should follow a similar >> > > > > > approach (47 or 48 bit default limit). Better to have some ABI >> > > > > > consistency between architectures. One can still ask for addresses above >> > > > > > this default limit via mmap(). >> > > > > >> > > > > I think that is best as well. >> > > > > >> > > > > Can we please just do what x86 and arm64 does? >> > > > >> > > > I responded to Arnd in the other thread, but I am still not convinced >> > > > that the solution that x86 and arm64 have selected is the best solution. >> > > > The solution of defaulting to 47 bits does allow applications the >> > > > ability to get addresses that are below 47 bits. However, due to >> > > > differences across architectures it doesn't seem possible to have all >> > > > architectures default to the same value. Additionally, this flag will be >> > > > able to help users avoid potential bugs where a hint address is passed >> > > > that causes upper bits of a VA to be used. >> > > >> > > The reason we added this limit on arm64 is that we noticed programs >> > > using the top 8 bits of a 64-bit pointer for additional information. >> > > IIRC, it wasn't even openJDK but some JavaScript JIT. We could have >> > > taught those programs of a new flag but since we couldn't tell how many >> > > are out there, it was the safest to default to a smaller limit and opt >> > > in to the higher one. Such opt-in is via mmap() but if you prefer a >> > > prctl() flag, that's fine by me as well (though I think this should be >> > > opt-in to higher addresses rather than opt-out of the higher addresses). >> > >> > The mmap() flag was used in previous versions but was decided against >> > because this feature is more useful if it is process-wide. A >> > personality() flag was chosen instead of a prctl() flag because there >> > existed other flags in personality() that were similar. I am tempted to >> > use prctl() however because then we could have an additional arg to >> > select the exact number of bits that should be reserved (rather than >> > being fixed at 47 bits). >> >> I am very much not in favour of a prctl(), it would require us to add state >> limiting the address space and the timing of it becomes critical. Then we >> have the same issue we do with the other proposals as to - what happens if >> this is too low? >> >> What is 'too low' varies by architecture, and for 32-bit architectures >> could get quite... problematic. >> >> And again, wha is the RoI here - we introducing maintenance burden and edge >> cases vs. the x86 solution in order to... accommodate things that need more >> than 128 TiB of address space? A problem that does not appear to exist in >> reality? >> >> I suggested the personality approach as the least impactful compromise way >> of this series working, but I think after what Arnd has said (and please >> forgive me if I've missed further discussion have been dipping in and out >> of this!) - adapting risc v to the approach we take elsewhere seems the >> most sensible solution to me. There's one wrinkle here: RISC-V started out with 39-bit VAs by default, and we've had at least one report of userspace breaking when moving to 48-bit addresses. That was just address sanitizer, so maybe nobody cares, but we're still pretty early in the transition to 48-bit systems (most of the HW is still 39-bit) so it's not clear if that's going to be the only bug. So we're sort of in our own world of backwards compatibility here. 39-bit vs 48-bit is just an arbitrary number, but "38 bits are enough for userspace" doesn't seem as sane a "47 bits are enough for userspace". Maybe the right answer here is to just say the 38-bit userspace is broken and that it's a Linux-ism that 64-bit sytems have 47-bit user addresses by default. >> This remains something we can revisit in future if this turns out to be >> egregious. >> > > I appreciate Arnd's comments, but I do not think that making 47-bit the > default is the best solution for riscv. On riscv, support for 48-bit > address spaces was merged in 5.17 and support for 57-bit address spaces > was merged in 5.18 without changing the default addresses provided by > mmap(). It could be argued that this was a mistake, however since at the > time there didn't exist hardware with larger address spaces it wasn't an > issue. The applications that existed at the time that relied on the > smaller address spaces have not been able to move to larger address > spaces. Making a 47-bit user-space address space default solves the > problem, but that is not arch agnostic, and can't be since of the > varying differences in page table sizes across architectures, which is > the other part of the problem I am trying to solve. > >> > >> > Opting-in to the higher address space is reasonable. However, it is not >> > my preference, because the purpose of this flag is to ensure that >> > allocations do not exceed 47-bits, so it is a clearer ABI to have the >> > applications that want this guarantee to be the ones setting the flag, >> > rather than the applications that want the higher bits setting the flag. >> >> Perfect is the enemy of the good :) and an idealised solution may not end >> up being something everybody can agree on. > > Yes you are totally right! Although this is not my ideal solution, it > sufficiently accomplishes the goal so I think it is reasonable to > implement this as a personality flag. > >> >> > >> > - Charlie >> > >> > > >> > > -- >> > > Catalin >> > >> > >> >