Re: [LSF/MM/BPF TOPIC] Design challenges for a new file system that needs to support multiple billions of file

Ric Wheeler <ricwheeler@xxxxxxxxx> · Mon, 3 Feb 2025 17:18:48 +0100

On 2/3/25 4:22 PM, Amir Goldstein wrote:
On Sun, Feb 2, 2025 at 10:40 PM RIc Wheeler <ricwheeler@xxxxxxxxx> wrote:

I have always been super interested in how much we can push the
scalability limits of file systems and for the workloads we need to
support, we need to scale up to supporting absolutely ridiculously large
numbers of files (a few billion files doesn't meet the need of the
largest customers we support).

Hi Ric,

Since LSFMM is not about presentations, it would be better if the topic to
discuss was trying to address specific technical questions that developers
could discuss.

Totally agree - from the ancient history of LSF (before MM or BPF!) we 
also pushed for discussions over talks.

If a topic cannot generate a discussion on the list, it is not very
likely that it will
generate a discussion on-prem.

Where does the scaling with the number of files in a filesystem affect existing
filesystems? What are the limitations that you need to overcome?

Local file systems like xfs running on "scale up" giant systems (think 
of the old super sized HP Superdomes and the like) would be likely to 
handle this well.

In a lot of ways, ngnfs means to replicate that scalability for "scale 
out" (hate buzz words!) systems that are more affordable. In effect, you 
can size your system by just adding more servers with their local NVME 
devices and build up performance and capacity in an incremental way.

Shared disk file systems like scoutfs which (also GPL'ed but not 
upstream) scale pretty well in file count but have coarse grain locking 
that causes performance bumps and the added complexity of needed RAID 
heads or SAN systems.

Zach Brown is leading a new project on ngnfs (FOSDEM talk this year gave
a good background on this -
https://www.fosdem.org/2025/schedule/speaker/zach_brown/).  We are
looking at taking advantage of modern low latency NVME devices and
today's networks to implement a distributed file system that provides
better concurrency that high object counts need and still have the
bandwidth needed to support the backend archival systems we feed.

I heard this talk and it was very interesting.
Here's a direct link to slides from people who may be too lazy to
follow 3 clicks:
https://www.fosdem.org/2025/events/attachments/fosdem-2025-5471-ngnfs-a-distributed-file-system-using-block-granular-consistency/slides/236150/zach-brow_aqVkVuI.pdf

I was both very impressed by the cache coherent rename example
and very puzzled - I do not know any filesystem where rename can be
synchronized on a single block io, and looking up ancestors is usually
done on in-memory dentries, so I may not have understood the example.

ngnfs as a topic would go into the coherence design (and code) that
underpins the increased concurrency it aims to deliver.

Clear that the project is in early days compared to most of the proposed
content, but it can be useful to spend some of the time on new ideas.

This sounds like an interesting topic to discuss.
I would love it if you or Zach could share more details on the list so that more
people could participate in the discussion leading to LSFMM.

Also, I think it is important to mention, as you told me, that the
server implementation
of ngnfs is GPL and to provide some pointers, because IMO this is very important
when requesting community feedback on a new filesystem.

Thanks,
Amir.

All of ngnfs is GPL'ed - no non-open source client or similar.

Regards,

Ric