Andreas Dilger <adilger@xxxxxxxxx> writes: > On Apr 13, 2024, at 8:15 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: >> >> On Sat, Apr 13, 2024 at 07:46:03PM -0600, Andreas Dilger wrote: >> >>> As to whether the 0xfffff000 address itself is valid for riscv32 is >>> outside my realm, but given that RAM is cheap it doesn't seem unlikely >>> to have 4GB+ of RAM and want to use it all. The riscv32 might consider >>> reserving this page address from allocation to avoid similar issues in >>> other parts of the code, as is done with the NULL/0 page address. >> >> Not a chance. *Any* page mapped there is a serious bug on any 32bit >> box. Recall what ERR_PTR() is... >> >> On any architecture the virtual addresses in range (unsigned long)-512.. >> (unsigned long)-1 must never resolve to valid kernel objects. >> In other words, any kind of wraparound here is asking for an oops on >> attempts to access the elements of buffer - kernel dereference of >> (char *)0xfffff000 on a 32bit box is already a bug. >> >> It might be getting an invalid pointer, but arithmetical overflows >> are irrelevant. > > The original bug report stated that search_buf = 0xfffff000 on entry, > and I'd quoted that at the start of my email: > > On Apr 12, 2024, at 8:57 AM, Björn Töpel <bjorn@xxxxxxxxxx> wrote: >> What I see in ext4_search_dir() is that search_buf is 0xfffff000, and at >> some point the address wraps to zero, and boom. I doubt that 0xfffff000 >> is a sane address. > > Now that you mention ERR_PTR() it definitely makes sense that this last > page HAS to be excluded. > > So some other bug is passing the bad pointer to this code before this > error, or the arch is not correctly excluding this page from allocation. Yeah, something is off for sure. (FWIW, I manage to hit this for Linus' master as well.) I added a print (close to trace_mm_filemap_add_to_page_cache()), and for this BT: [<c01e8b34>] __filemap_add_folio+0x322/0x508 [<c01e8d6e>] filemap_add_folio+0x54/0xce [<c01ea076>] __filemap_get_folio+0x156/0x2aa [<c02df346>] __getblk_slow+0xcc/0x302 [<c02df5f2>] bdev_getblk+0x76/0x7a [<c03519da>] ext4_getblk+0xbc/0x2c4 [<c0351cc2>] ext4_bread_batch+0x56/0x186 [<c036bcaa>] __ext4_find_entry+0x156/0x578 [<c036c152>] ext4_lookup+0x86/0x1f4 [<c02a3252>] __lookup_slow+0x8e/0x142 [<c02a6d70>] walk_component+0x104/0x174 [<c02a793c>] path_lookupat+0x78/0x182 [<c02a8c7c>] filename_lookup+0x96/0x158 [<c02a8d76>] kern_path+0x38/0x56 [<c0c1cb7a>] init_mount+0x5c/0xac [<c0c2ba4c>] devtmpfs_mount+0x44/0x7a [<c0c01cce>] prepare_namespace+0x226/0x27c [<c0c011c6>] kernel_init_freeable+0x286/0x2a8 [<c0b97ab8>] kernel_init+0x2a/0x156 [<c0ba22ca>] ret_from_fork+0xe/0x20 I get a folio where folio_address(folio) == 0xfffff000 (which is broken). Need to go into the weeds here... Björn