On Fri, Jun 30, 2023 at 4:17 PM Darrick J. Wong <djwong@xxxxxxxxxx> wrote: > > On Fri, Jun 30, 2023 at 04:05:36PM +0300, Amir Goldstein wrote: > > On Fri, Jun 30, 2023 at 3:30 PM Ignat Korchagin <ignat@xxxxxxxxxxxxxx> wrote: > > > > > > On Fri, Jun 30, 2023 at 11:39 AM Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > > > > > > > On Thu, Jun 29, 2023 at 10:31 PM Ignat Korchagin <ignat@xxxxxxxxxxxxxx> wrote: > > > > > > > > > > On Thu, Jun 29, 2023 at 7:14 PM Darrick J. Wong <djwong@xxxxxxxxxx> wrote: > > > > > > > > > > > > [add the xfs lts maintainers] > > > > > > > > > > > > On Thu, Jun 29, 2023 at 05:34:00PM +0100, Matthew Wilcox wrote: > > > > > > > On Thu, Jun 29, 2023 at 05:09:41PM +0100, Daniel Dao wrote: > > > > > > > > Hi Dave and Derrick, > > > > > > > > > > > > > > > > We are tracking down some corruptions on xfs for our rocksdb workload, > > > > > > > > running on kernel 6.1.25. The corruptions were > > > > > > > > detected by rocksdb block checksum. The workload seems to share some > > > > > > > > similarities > > > > > > > > with the multi-threaded write workload described in > > > > > > > > https://lore.kernel.org/linux-fsdevel/20221129001632.GX3600936@xxxxxxxxxxxxxxxxxxx/ > > > > > > > > > > > > > > > > Can we backport the patch series to stable since it seemed to fix data > > > > > > > > corruptions ? > > > > > > > > > > > > > > For clarity, are you asking for permission or advice about doing this > > > > > > > yourself, or are you asking somebody else to do the backport for you? > > > > > > > > > > > > Nobody's officially committed to backporting and testing patches for > > > > > > 6.1; are you (Cloudflare) volunteering? > > > > > > > > > > Yes, we have applied them on top of 6.1.36, will be gradually > > > > > releasing to our servers and will report back if we see the issues go > > > > > away > > > > > > > > > > > > > Getting feedback back from Cloudflare production servers is awesome > > > > but it's not enough. > > > > > > > > The standard for getting xfs LTS backports approved is: > > > > 1. Test the backports against regressions with several rounds of fstests > > > > check -g auto on selected xfs configurations [1] > > > > 2. Post the backport series to xfs list and get an ACK from upstream > > > > xfs maintainers > > > > > > > > We have volunteers doing this work for 5.4.y, 5.10.y and 5.15.y. > > > > We do not yet have a volunteer to do that work for 6.1.y. > > > > > > > > The question is whether you (or your team) are volunteering to > > > > do that work for 6.1.y xfs backports to help share the load? Circling back on this. So far it seems that the patchset in question does fix the issues of rocksdb corruption as we haven't seen them for some time on our test group. We're happy to dedicate some efforts now to get them officially backported to 6.1 according to the process. We did try basic things with kdevops and would like to learn more. Fred (cc-ed here) is happy to drive the effort and be the primary contact on this. Could you, please, guide us/him on the process? > > > We are not a big team and apart from other internal project work our > > > efforts are focused on fixing this issue in production, because it > > > affects many teams and workloads. If we confirm that these patches fix > > > the issue in production, we will definitely consider dedicating some > > > work to ensure they are officially backported. But if not - we would > > > be required to search for a fix first before we can commit to any > > > work. > > > > > > So, IOW - can we come back to you a bit later on this after we get the > > > feedback from production? > > > > > > > Of course. > > The volunteering question for 6.1.y is independent. > > > > When you decide that you have a series of backports > > that proves to fix a real bug in production, > > a way to test the series will be worked out. > > /me notes that xfs/558 and xfs/559 (in fstests) are the functional tests > for these patches that you're backporting; it would be useful to have a > third party (i.e. not just the reporter and the author) confirm that the > two fstests pass when real workloads are fixed. > > --D > > > Thanks, > > Amir. Ignat