On Sun, Oct 20, 2024 at 09:53:06PM +0200, Vlastimil Babka wrote: > On 10/20/24 20:53, Kent Overstreet wrote: > > On Sun, Oct 20, 2024 at 11:46:11AM -0700, Linus Torvalds wrote: > >> On Sun, 20 Oct 2024 at 10:04, Kent Overstreet <kent.overstreet@xxxxxxxxx> wrote: > >> > > >> > But given that vmalloc() already supports > INT_MAX requests, and memory > >> > sizes keep growing so 2GB is getting pretty small - I think it's time, > >> > this is going to come up in other places sooner or later. > >> > >> No. > >> > >> If you need 2GB+ memory for filesystem operations, you fix your code. > > > > This is for journal replay, where we've got a big array of keys and we > > need to sort them. > > > > The keys have to fit in memory (and had to fit in memory previously, for > > them to be dirty in the journal); > > What if the disk is moved to a smaller system, should the fs still mount > there? (I don't mean such a small system that it can't vmalloc() 2GB > specifically, but in principle...) You'll have to do journal replay on the bigger system. Once you've done that, it'll work just fine on the smaller system. (Now, trying to work with a 75TB filesystem on a small machine is going to be really painful if you ever need to fsck. That's just an inherently hard problem, but we've got fsck scalability/performance improvements in the works). But journal replay does inherently require the whole contents of the journal to fit in memory - we have to do the sort + dedup so that we can overlay the contents of the journal over the btree until journal replay is finished so that we can get a consistent view of the filesystem, which we need so that we can run the allocator, and go read-write, which we need in order to do journal replay. Fun bootstrap problems.