[LSF/MM/BPF TOPIC] Lustre filesystem upstreaming

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Lustre is a high-performance parallel filesystem used for HPC
and AI/ML compute clusters available under GPLv2. Lustre is
currently used by 65% of the Top-500 (9 of Top-10) systems in
HPC [7]. Outside of HPC, Lustre is used by many of the largest
AI/ML clusters in the world, and is commercially supported by
numerous vendors and cloud service providers [1].

After 21 years and an ill-fated stint in staging, Lustre is still
maintained as an out-of-tree module [6]. The previous upstreaming
effort suffered from a lack of developer focus and user adoption,
which eventually led to Lustre being removed from staging
altogether [2].

However, the work to improve Lustre has continued regardless. In
the intervening years, the code improvements that previously
prevented a return to mainline have been steadily progressing. At
least 25% of patches accepted for Lustre 2.16 were related to the
upstreaming effort [3]. And all of the remaining work is
in-flight [4][5][8].

Our eventual goal is to a get both the Lustre client and server
(on ext4/ldiskfs) along with at least TCP/IP networking to an
acceptable quality before submitting to mainline. The remaining
network support would follow soon afterwards.

I propose to discuss:

- As we alter our development model [8] to support upstream development,
  what is a sufficient demonstration of commitment that our model works?
- Should the client and server be submitted together? Or split?
- Expectations for a new filesystem to be accepted to mainline
- How to manage inclusion of a large code base (the client alone is
  200kLoC) without increasing the burden on fs/net maintainers

Lustre has already received a plethora of feedback in the past.
While much of that has been addressed since - the kernel is a
moving target. Several filesystems have been merged (or removed)
since Lustre left staging. We're aiming to avoid the mistakes of
the past and hope to address as many concerns as possible before
submitting for inclusion.

Thanks!

Timothy Day (Amazon Web Services - AWS)
James Simmons (Oak Ridge National Labs - ORNL)

[1] Wikipedia: https://en.wikipedia.org/wiki/Lustre_(file_system)#Commercial_technical_support
[2] Kicked out of staging: https://lwn.net/Articles/756565/
[3] This is a heuristic, based on the combined commit counts of
    ORNL, Aeon, SuSe, and AWS - which have been primarily working
    on upstreaming issues: https://youtu.be/BE--ySVQb2M?si=YMHitJfcE4ASWQcE&t=960
[4] LUG24 Upstreaming Update: https://www.depts.ttu.edu/hpcc/events/LUG24/slides/Day1/LUG_2024_Talk_02-Native_Linux_client_status.pdf
[5] Lustre Jira Upstream Progress: https://jira.whamcloud.com/browse/LU-12511
[6] Out-of-tree codebase: https://git.whamcloud.com/?p=fs/lustre-release.git;a=tree
[7] Graph: https://8d118135-f68b-475d-9b6d-ef84c0db1e71.usrfiles.com/ugd/8d1181_bb8f9405d77a4e2bad53531aa94e8868.pdf
[8] Project Wiki: https://wiki.lustre.org/Upstream_contributing






[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux