On Thu, May 17, 2018 at 01:55:53PM +0300, Artem Blagodarenko wrote: > So I want to discuss some alternatives: > > 1) We could compile and link two lib versions: one with 32bit ext2_ino_t and another with 64bit. > Pass some macross that says “ext2_ino_t” is *bit now. User can link both libraries > (functions have same names, but different prototypes). I believe some extra cleanup is needed. > There are some local variables and functions parameters which have type “bitness” hardcoded. > But probably this less work then make both interface versions. > > 2) we could use LD Version Scripts > https://www.gnu.org/software/gnulib/manual/html_node/LD-Version-Scripts.html > This approach looks elegant, but still need more work to be done Using LD Version Scripts doesn't really help much in terms of the long term maintainbilty of the source code, since you still have to maintain two different function names in the library. The only other solution is to bite the bullet and just accept that we have to do a major version number bump in the shared library. This pushes the pain to the distributions, since they now have to rebuilt all of the packages that depend on libext2fs. There aren't _that_ many packages, but it does mean a certain of attention. I think the other thing we really need to have a conversation about is the cost/benefit ratio of 64-bit inode numbers in the first place. It is going to be a huge amount of work, and it's going to have a pretty large impact on the ext4 ecosystem. And I am worrying about about what it does to the long term maintainability of the code --- especially since so very few people will likely use the feature. Against that, I'm not sure I understand what the benefits are. It seems to be mostly for Lustre, but I really don't understand why Lustre can't more efficiently handle a large number of targets (file systems). Using a single file system per disk makes it much easier to balancing disk utilization. It also speeds up file system recovery after a crash, since e2fsck can much more efficiently run in parallel across each disk attached to a server. It also matches up the failure domain caused by corrupted file system metadata with the failure domain associated with HDD failure. It might be interesting to look at this slide deck [1] (starting at slide 16, "Part 2: Colossus and Efficiency Storage") for an examination of the benefits of being able to balance I/O operations across all of the disks in a cluster. [1] http://www.pdsw.org/pdsw-discs17/slides/PDSW-DISCS-Google-Keynote.pdf So can we have a conversation about whether it *really* makes sense to try to put a huge number of Lustre objects into a single large ext4 file system? Because quite frankly, I don't really see it. It certainly goes against the design pardigms we've used at Google for designing cluster file systems. Regards, - Ted