Re: [PATCH v4 5/7] ext2fs: Add helper functions to access inode numbers

"Theodore Y. Ts'o" <tytso@xxxxxxx> · Thu, 17 May 2018 11:50:03 -0400

On Thu, May 17, 2018 at 01:55:53PM +0300, Artem Blagodarenko wrote:
> So I want to discuss some alternatives:
> 
> 1) We could compile and link two lib versions: one with 32bit ext2_ino_t and another with 64bit.
> Pass some macross that says “ext2_ino_t” is *bit now. User can link both libraries
> (functions have same names, but different prototypes). I believe some extra cleanup is needed.
>  There are some local variables and functions parameters which have type “bitness” hardcoded. 
> But probably this less work then make both interface versions.
> 
> 2) we could use LD Version Scripts
> https://www.gnu.org/software/gnulib/manual/html_node/LD-Version-Scripts.html
> This approach looks elegant, but still need more work to be done

Using LD Version Scripts doesn't really help much in terms of the long
term maintainbilty of the source code, since you still have to
maintain two different function names in the library.

The only other solution is to bite the bullet and just accept that we
have to do a major version number bump in the shared library.  This
pushes the pain to the distributions, since they now have to rebuilt
all of the packages that depend on libext2fs.  There aren't _that_
many packages, but it does mean a certain of attention.

I think the other thing we really need to have a conversation about is
the cost/benefit ratio of 64-bit inode numbers in the first place.  It
is going to be a huge amount of work, and it's going to have a pretty
large impact on the ext4 ecosystem.  And I am worrying about about
what it does to the long term maintainability of the code ---
especially since so very few people will likely use the feature.

Against that, I'm not sure I understand what the benefits are.  It
seems to be mostly for Lustre, but I really don't understand why
Lustre can't more efficiently handle a large number of targets (file
systems).  Using a single file system per disk makes it much easier to
balancing disk utilization.  It also speeds up file system recovery
after a crash, since e2fsck can much more efficiently run in parallel
across each disk attached to a server.  It also matches up the failure
domain caused by corrupted file system metadata with the failure
domain associated with HDD failure.

It might be interesting to look at this slide deck [1] (starting at
slide 16, "Part 2: Colossus and Efficiency Storage") for an
examination of the benefits of being able to balance I/O operations
across all of the disks in a cluster.

[1] http://www.pdsw.org/pdsw-discs17/slides/PDSW-DISCS-Google-Keynote.pdf

So can we have a conversation about whether it *really* makes sense to
try to put a huge number of Lustre objects into a single large ext4
file system?  Because quite frankly, I don't really see it.  It
certainly goes against the design pardigms we've used at Google for
designing cluster file systems.

Regards,

					- Ted