On Mar 20, 2017, at 6:22 AM, Благодаренко Артём <artem.blagodarenko@xxxxxxxxx> wrote: > > Hello, > > This topic was mentioned in "Add largedir feature”, but need to be discussed in > a separate thread. > Increasing maximum inode count is useful with and without larg_dir. There is at > least one user who needs 64 bit inode number - Lustre FS. > > As mentioned, MDS has 0-size files to store some information about Lustre FS > files. Current MDS disk sizes allow to store large amount of such files, but > EXT4 limits this number to ~4 billions. > > Lustre FS has features like DNE to distribute MDS over many targets (disks), > but disks are used not effectively. It would be great to have ability to > store more then ~4 billions inodes on one EXT4 file system. I guess the major potential problem with more than 4B inodes in a single filesystem (and also the large 300TB+ filesystems you are using) is that running e2fsck could take a very long time. Conversely, using DNE to spread the metadata across multiple filesystems/servers allows e2fsck to run in parallel and limits any failures to a smaller subset of the filesystem. That doesn't mean I'm totally against this feature, since > 8TB disks are becoming common. > I know there is dirdata feature that allows to store higher 32 bit of inode > number in ext4 dirent. As I know, direct was not merged yet because of user > absence. Quote of Andreas from "Add largedir feature” > >> Mostly because there hasn't been any interest for it whenever I proposed >> merging it in the past. If there is some renewed interest in merging it >> I could look into it … > > It looks like Lustre FS requires this feature now. > > There is another approach how to solve this problem. It is obvious, but > require change on disk format. Theodore’s quote from "Add largedir feature” > >> I can imagine a new feature flag which defines the use a 64-bit inode >> number, but that's more for people who are creating a file system that >> takes advantage of 64-bit block numbers, and they are intending on >> using all of that space to store small (< 4k or < 8k) files. > > This is exact Lustre FS MDS example. Many small inodes. If it possible to add new feature flag, probably this is the best solution: simple, obvious, fast. > > Please, help with this questions: > 1. Do we need 64 bit number now? (My opinion - we need it) > 2. What solution from two above to choose? Another solution? It wasn't clear from Ted's comments whether he was proposing the feature flag to store 64-bit inode numbers directly into a new ext4_dir_entry64, or to use the dir_data to hold the high 32 bits? My preference would be to store the high bits of the inode number into dir_data. The reasons are: - this won't use more space for 64-bit inodes than ext4_dir_entry64 - for 32-bit inode numbers will have smaller dirents - significantly more 32-bit dirents can fit into a leaf block (i.e. 10-25%) - it is backwards compatible with existing directories and can transparently store 64-bit inode numbers into 32-bit directories without a full update - it avoids duplicate code paths for ext4_dir_entry vs ext4_dir_entry64 - it would be possible to only store high 16 bits (2^48 inodes) since this may be enough for ext4, since ext4_extent can only address 2^48 blocks (2^60 bytes) and there isn't much value to more inodes than blocks? Cheers, Andreas
Attachment:
signature.asc
Description: Message signed with OpenPGP