Lustre is a high-performance parallel filesystem used for HPC and AI/ML compute clusters available under GPLv2. Lustre is currently used by 65% of the Top-500 (9 of Top-10) systems in HPC [7]. Outside of HPC, Lustre is used by many of the largest AI/ML clusters in the world, and is commercially supported by numerous vendors and cloud service providers [1]. After 21 years and an ill-fated stint in staging, Lustre is still maintained as an out-of-tree module [6]. The previous upstreaming effort suffered from a lack of developer focus and user adoption, which eventually led to Lustre being removed from staging altogether [2]. However, the work to improve Lustre has continued regardless. In the intervening years, the code improvements that previously prevented a return to mainline have been steadily progressing. At least 25% of patches accepted for Lustre 2.16 were related to the upstreaming effort [3]. And all of the remaining work is in-flight [4][5][8]. Our eventual goal is to a get both the Lustre client and server (on ext4/ldiskfs) along with at least TCP/IP networking to an acceptable quality before submitting to mainline. The remaining network support would follow soon afterwards. I propose to discuss: - As we alter our development model [8] to support upstream development, what is a sufficient demonstration of commitment that our model works? - Should the client and server be submitted together? Or split? - Expectations for a new filesystem to be accepted to mainline - How to manage inclusion of a large code base (the client alone is 200kLoC) without increasing the burden on fs/net maintainers Lustre has already received a plethora of feedback in the past. While much of that has been addressed since - the kernel is a moving target. Several filesystems have been merged (or removed) since Lustre left staging. We're aiming to avoid the mistakes of the past and hope to address as many concerns as possible before submitting for inclusion. Thanks! Timothy Day (Amazon Web Services - AWS) James Simmons (Oak Ridge National Labs - ORNL) [1] Wikipedia: https://en.wikipedia.org/wiki/Lustre_(file_system)#Commercial_technical_support [2] Kicked out of staging: https://lwn.net/Articles/756565/ [3] This is a heuristic, based on the combined commit counts of ORNL, Aeon, SuSe, and AWS - which have been primarily working on upstreaming issues: https://youtu.be/BE--ySVQb2M?si=YMHitJfcE4ASWQcE&t=960 [4] LUG24 Upstreaming Update: https://www.depts.ttu.edu/hpcc/events/LUG24/slides/Day1/LUG_2024_Talk_02-Native_Linux_client_status.pdf [5] Lustre Jira Upstream Progress: https://jira.whamcloud.com/browse/LU-12511 [6] Out-of-tree codebase: https://git.whamcloud.com/?p=fs/lustre-release.git;a=tree [7] Graph: https://8d118135-f68b-475d-9b6d-ef84c0db1e71.usrfiles.com/ugd/8d1181_bb8f9405d77a4e2bad53531aa94e8868.pdf [8] Project Wiki: https://wiki.lustre.org/Upstream_contributing