Alexandre Ghiti <alexghiti@xxxxxxxxxxxx> 於 2024年2月5日 週一 下午5:32寫道: > > Hi Nylon, > > On Fri, Jan 19, 2024 at 10:27 AM Nylon Chen <nylon.chen@xxxxxxxxxx> wrote: > > > > Alexandre Ghiti <alexghiti@xxxxxxxxxxxx> 於 2024年1月18日 週四 下午9:01寫道: > > > > > > Hi Nylon, > > Hi Alexandre, thanks for your feedback, > > > > > > On Thu, Jan 18, 2024 at 9:23 AM Nylon Chen <nylon.chen@xxxxxxxxxx> wrote: > > > > > > > > > On 3/23/23 15:55, Anup Patel wrote: > > > > > > On Thu, Mar 23, 2023 at 6:24 PM Alexandre Ghiti <alexghiti@xxxxxxxxxxxx> wrote: > > > > > >> Hi Anup, > > > > > >> > > > > > >> On Thu, Mar 23, 2023 at 1:18 PM Anup Patel <apatel@xxxxxxxxxxxxxxxx> wrote: > > > > > >>> Hi Alex, > > > > > >>> > > > > > >>> On Thu, Mar 16, 2023 at 6:48 PM Alexandre Ghiti <alexghiti@xxxxxxxxxxxx> wrote: > > > > > >>>> This patchset intends to improve tlb utilization by using hugepages for > > > > > >>>> the linear mapping. > > > > > >>>> > > > > > >>>> As reported by Anup in v6, when STRICT_KERNEL_RWX is enabled, we must > > > > > >>>> take care of isolating the kernel text and rodata so that they are not > > > > > >>>> mapped with a PUD mapping which would then assign wrong permissions to > > > > > >>>> the whole region: it is achieved by introducing a new memblock API. > > > > > >>>> > > > > > >>>> Another patch makes use of this new API in arm64 which used some sort of > > > > > >>>> hack to solve this issue: it was built/boot tested successfully. > > > > > >>>> > > > > > >>>> base-commit-tag: v6.3-rc1 > > > > > >>>> > > > > > >>>> v8: > > > > > >>>> - Fix rv32, as reported by Anup > > > > > >>>> - Do not modify memblock_isolate_range and fixes comment, as suggested by Mike > > > > > >>>> - Use the new memblock API for crash kernel too in arm64, as suggested by Andrew > > > > > >>>> - Fix arm64 double mapping (which to me did not work in v7), but ends up not > > > > > >>>> being pretty at all, will wait for comments from arm64 reviewers, but > > > > > >>>> this patch can easily be dropped if they do not want it. > > > > > >>>> > > > > > >>>> v7: > > > > > >>>> - Fix Anup bug report by introducing memblock_isolate_memory which > > > > > >>>> allows us to split the memblock mappings and then avoid to map the > > > > > >>>> the PUD which contains the kernel as read only > > > > > >>>> - Add a patch to arm64 to use this newly introduced API > > > > > >>>> > > > > > >>>> v6: > > > > > >>>> - quiet LLVM warning by casting phys_ram_base into an unsigned long > > > > > >>>> > > > > > >>>> v5: > > > > > >>>> - Fix nommu builds by getting rid of riscv_pfn_base in patch 1, thanks > > > > > >>>> Conor > > > > > >>>> - Add RB from Andrew > > > > > >>>> > > > > > >>>> v4: > > > > > >>>> - Rebase on top of v6.2-rc3, as noted by Conor > > > > > >>>> - Add Acked-by Rob > > > > > >>>> > > > > > >>>> v3: > > > > > >>>> - Change the comment about initrd_start VA conversion so that it fits > > > > > >>>> ARM64 and RISCV64 (and others in the future if needed), as suggested > > > > > >>>> by Rob > > > > > >>>> > > > > > >>>> v2: > > > > > >>>> - Add a comment on why RISCV64 does not need to set initrd_start/end that > > > > > >>>> early in the boot process, as asked by Rob > > > > > >>>> > > > > > >>>> Alexandre Ghiti (4): > > > > > >>>> riscv: Get rid of riscv_pfn_base variable > > > > > >>>> mm: Introduce memblock_isolate_memory > > > > > >>>> arm64: Make use of memblock_isolate_memory for the linear mapping > > > > > >>>> riscv: Use PUD/P4D/PGD pages for the linear mapping > > > > > >>> Kernel boot fine on RV64 but there is a failure which is still not > > > > > >>> addressed. You can see this failure as following message in > > > > > >>> kernel boot log: > > > > > >>> 0.000000] Failed to add a System RAM resource at 80200000 > > > > > >> Hmmm I don't get that in any of my test configs, would you mind > > > > > >> sharing yours and your qemu command line? > > > > > > Try alexghiti_test branch at > > > > > > https://github.com/avpatel/linux.git > > > > > > > > > > > > I am building the kernel using defconfig and my rootfs is > > > > > > based on busybox. > > > > > > > > > > > > My QEMU command is: > > > > > > qemu-system-riscv64 -M virt -m 512M -nographic -bios > > > > > > opensbi/build/platform/generic/firmware/fw_dynamic.bin -kernel > > > > > > ./build-riscv64/arch/riscv/boot/Image -append "root=/dev/ram rw > > > > > > console=ttyS0 earlycon" -initrd ./rootfs_riscv64.img -smp 4 > > > > > > > > > > > > > > > So splitting memblock.memory is the culprit, it "confuses" the resources > > > > > addition and I can only find hacky ways to fix that... > > > > Hi Alexandre, > > > > > > > > We encountered the same error as Anup. After adding your patch > > > > (3335068f87217ea59d08f462187dc856652eea15), we will not encounter the > > > > error again. > > > > > > > > What I have observed so far is > > > > > > > > - before your patch > > > > When merging consecutive memblocks, if the memblock types are different, > > > > they will be merged into reserved > > > > - after your patch > > > > When consecutive memblocks are merged, if the memblock types are > > > > different, they will be merged into memory. > > > > > > > > Such a result will cause the memory location of OpenSBI to be changed > > > > from reserved to memory. Will this have any side effects? > > > > > > I guess it will end up in the memory pool and pages from openSBI > > > region will be allocated, so we should see very quickly bad stuff > > > happening (either PMP violation or M-mode ecall never > > > returning/trapping/etc). > > > > > > But I don't observe the same thing, I always see the openSBI region > > > being reserved: > > > > > > reserved[0x0] [0x0000000080000000-0x000000008007ffff], > > > 0x0000000000080000 bytes flags: 0x0 > > > > > > Can you elaborate a bit more about "When consecutive memblocks are > > > merged, if the memblock types are different, they will be merged into > > > memory"? Where/when does this merge happen? Can you give me a config > > > file and a kernel revision so that I can take a look? > > Ok, If you want to reproduce the same results you just need to modify OpenSBI > > > > [ lib/sbi/sbi_domain.c ] > > +#define TEST_SIZE 0x200000 > > > > - (scratch->fw_size - scratch->fw_rw_offset), > > + (TEST_SIZE - scratch->fw_rw_offset), > > > > In addition, you can insert checks in the kernel merged function > > [ mm/memblock.c ] > > static void __init_memblock memblock_merge_regions(struct memblock_type *type) > > while (i < type->cnt - 1) { > > ... > > /* move forward from next + 1, index of which is i + 2 */ > > memmove(next, next + 1, (type->cnt - (i + 2)) * sizeof(*next)); > > type->cnt--; > > } > > + pr_info("Merged memblock_type: cnt = %lu, max = %lu, > > total_size = 0x%llx\n",type->cnt, type->max, type->total_size); > > + for (i = 0; i < type->cnt; i++) { > > + const char *region_type = > > memblock_is_memory(type->regions[i].base) ? "memory" : "reserve"; > > + pr_info("Region %d: base = 0x%llx, size = 0x%llx, type > > = %s\n", i, type->regions[i].base, type->regions[i].size, > > region_type); > > + } > > } > > This is kernel boot log > > - before your patch > > ... > > [ 0.000000] OF: fdt: Reserving memory: base = 0x80000000, size = 0x200000 > > [ 0.000000] Merged memblock_type: cnt = 4, max = 128, total_size = 0x1628501 > > [ 0.000000] Region 0: base = 0x80000000, size = 0x1600000, type = reserve > > ... > > > > - after your patch > > ... > > [ 0.000000] OF: fdt: Reserving memory: base = 0x80000000, size = 0x200000 > > [ 0.000000] Merged memblock_type: cnt = 4, max = 128, total_size = 0x180c42e > > [ 0.000000] Region 0: base = 0x80000000, size = 0x1800000, type = memory > Hi Alex, thanks for your feedback. > So the openSBI region is marked as memory, and not reserved because > this region is now described as nomap, and memblock_mark_nomap() does > not move this region into the reserved memblock list, but keep it in > the memory list with the nomap flag > (https://elixir.bootlin.com/linux/latest/source/drivers/of/fdt.c#L479). > But as stated in the description of memblock_mark_nomap() > (https://elixir.bootlin.com/linux/latest/source/mm/memblock.c#L969), > the pages associated with the region will be marked as PageReserved > and the region will not be covered in the linear mapping. I traced it via GDB, and indeed, it enters early_init_dt_reserve_memory() and calls memblock_reserve to reserve this block of memory. [before your patch] [ 0.000000] OF: fdt: check nomap Reserving memory: base = 0x80000000, size = 0x200000 [ 0.000000] --- Reserved memory: Base address: 80000000, Size: 200000--- [ 0.000000] Merged memblock_type: cnt = 4, max = 128, total_size = 0x1e28501 [ 0.000000] Region 0: base = 0x80000000, size = 0x1e00000, type = reserve [ 0.000000] Region 1: base = 0xbfe00000, size = 0x6002, type = memory .... [ 0.000000] OF: fdt: Reserved memory: reserved region for node 'mmode_resv0@80000000': base 0x0000000080000000, size 2 MiB [ 0.000000] OF: reserved mem: 0x0000000080000000..0x00000000801fffff (2048 KiB) map non-reusable mmode_resv0@80000000 [after your patch] [ 0.000000] OF: fdt: check nomap Reserving memory: base = 0x80000000, size = 0x200000 [ 0.000000] --- Reserved memory: Base address: 80000000, Size: 200000--- [ 0.000000] Merged memblock_type: cnt = 4, max = 128, total_size = 0x1e25501 [ 0.000000] Region 0: base = 0x80000000, size = 0x1e00000, type = memory [ 0.000000] Region 1: base = 0xbfe00000, size = 0x6002, type = memory ... [ 0.000000] OF: fdt: Reserved memory: reserved region for node 'mmode_resv0@80000000': base 0x0000000080000000, size 2 MiB [ 0.000000] OF: reserved mem: 0x0000000080000000..0x00000000801fffff (2048 KiB) map non-reusable mmode_resv0@80000000 At the moment, it can be confirmed that there is no need to worry about this block of memory being used. But I still have a question I'd like to ask, which is why this location is flagged as 'reserve' instead of 'memory' in the memblock Thanks, Nylon > > So to me, this is normal and we are safe. Let me know if I made a mistake. > > And sorry for the long delay, that slipped my mind! > > Thanks, > > Alex > > > ... > > [ 0.000000] Failed to add a system RAM resource at 80200000 > > ... > > > > > > Thanks, > > > > > > Alex > > > > > > > > > > > > > So given that the arm64 patch with the new API is not pretty and that > > > > > the simplest solution is to re-merge the memblock regions afterwards > > > > > (which is done by memblock_clear_nomap), I'll drop the new API and the > > > > > arm64 patch to use the nomap API like arm64: I'll take advantage of that > > > > > to clean setup_vm_final which I have wanted to do for a long time. > > > > > > > > > > @Mike Thanks for you reviews! > > > > > > > > > > @Anup Thanks for all your bug reports on this patchset, I have to > > > > > improve my test flow (it is in the work :)). > > > > > > > > > > > > > > > > Regards, > > > > > > Anup > > > > > > > > > > > >> Thanks > > > > > >> > > > > > >>> Regards, > > > > > >>> Anup > > > > > >>> > > > > > >>>> arch/arm64/mm/mmu.c | 25 +++++++++++------ > > > > > >>>> arch/riscv/include/asm/page.h | 19 +++++++++++-- > > > > > >>>> arch/riscv/mm/init.c | 53 ++++++++++++++++++++++++++++------- > > > > > >>>> arch/riscv/mm/physaddr.c | 16 +++++++++++ > > > > > >>>> drivers/of/fdt.c | 11 ++++---- > > > > > >>>> include/linux/memblock.h | 1 + > > > > > >>>> mm/memblock.c | 20 +++++++++++++ > > > > > >>>> 7 files changed, 119 insertions(+), 26 deletions(-) > > > > > >>>> > > > > > >>>> -- > > > > > >>>> 2.37.2 > > > > > >>>> > > > > > > _______________________________________________ > > > > > > linux-riscv mailing list > > > > > > linux-riscv@xxxxxxxxxxxxxxxxxxx > > > > > > http://lists.infradead.org/mailman/listinfo/linux-riscv > > > > > > > > > > _______________________________________________ > > > > > linux-riscv mailing list > > > > > linux-riscv@xxxxxxxxxxxxxxxxxxx > > > > > http://lists.infradead.org/mailman/listinfo/linux-riscv