On Thu, 21 Nov 2019, zhengbin (A) wrote: > On 2019/11/20 23:45, Matthew Wilcox wrote: > > On Wed, Nov 20, 2019 at 10:23:18PM +0800, zhengbin wrote: > >> I have tried to change last_ino type to unsigned long, while this was > >> rejected, see details on https://patchwork.kernel.org/patch/11023915. > > Did you end up trying sbitmap? > > Maybe sbitmap is not a good solution, max_inodes of tmpfs are controlled by mount options--nrinodes, > > which can be modified by remountfs(bigger or smaller), as the comment of function sbitmap_resize says: > > * Doesn't reallocate anything. It's up to the caller to ensure that the new > * depth doesn't exceed the depth that the sb was initialized with. > > We can modify this to meet the growing requirements, there will still be questions as follows: > > 1. tmpfs is a ram filesystem, we need to allocate sbitmap memory for sbinfo->max_inodes(while this maybe huge) > > 2.If remountfs changes max_inode, we have to deal with it, while this may take a long time > > (bigger: we need to free the old sbitmap memory, allocate new memory, copy the old sbitmap to new sbitmap > > smaller: How do we deal with it?ie: we use sb->map[inode number/8] to find the sbitmap, we need to change the exist > > inode numbers?while this maybe used by userspace application.) > > > > > What I think is fundamentally wrong with this patch is that you've found a > > problem in get_next_ino() and decided to use a different scheme for this > > one filesystem, leaving every other filesystem which uses get_next_ino() > > facing the same problem. > > > > That could be acceptable if you explained why tmpfs is fundamentally > > different from all the other filesystems that use get_next_ino(), but > > you haven't (and I don't think there is such a difference. eg pipes, > > autofs and ipc mqueue could all have the same problem. > > tmpfs is same with all the other filesystems that use get_next_ino(), but we need to solve this problem one by one. > > If tmpfs is ok, we can modify the other filesystems too. Besides, I do not recommend all file systems share the same > > global variable, for performance impact consideration. > > > > > There are some other problems I noticed, but they're not worth bringing > > up until this fundamental design choice is justified. > Agree, thanks. Just a rushed FYI without looking at your patch or comments. Internally (in Google) we do rely on good tmpfs inode numbers more than on those of other get_next_ino() filesystems, and carry a patch to mm/shmem.c for it to use 64-bit inode numbers (and separate inode number space for each superblock) - essentially, ino = sbinfo->next_ino++; /* Avoid 0 in the low 32 bits: might appear deleted */ if (unlikely((unsigned int)ino == 0)) ino = sbinfo->next_ino++; Which I think would be faster, and need less memory, than IDA. But whether that is of general interest, or of interest to you, depends upon how prevalent 32-bit executables built without __FILE_OFFSET_BITS=64 still are these days. Hugh