On Wed, Oct 23, 2024 at 07:41:32AM -0400, Sasha Levin wrote: > On Wed, Oct 23, 2024 at 09:42:59PM +1100, Michael Ellerman wrote: > > Hi Sasha, > > > > This is awesome. > > > > Sasha Levin <sashal@xxxxxxxxxx> writes: > > > On Tue, Oct 22, 2024 at 01:49:31PM -0700, Darrick J. Wong wrote: > > > > On Tue, Oct 22, 2024 at 03:06:38PM -0400, Sasha Levin wrote: > > > > > other information that would be useful? > > > > > > > > As a maintainer I probably would've found this to be annoying, but with > > > > all my other outside observer / participant hats on, I think it's very > > > > good to have a bot to expose maintainers not following the process. > > > > > > This was my thinking too. Maybe it makes sense for the bot to shut up if > > > things look good (i.e. >N days in stable, everything on the mailing > > > list). Or maybe just a simple "LGTM" or a "Reviewed-by:..."? > > > > I think it has to reply with something, otherwise folks will wonder if > > the bot has broken or missed their pull request. > > > > But if all commits were in in linux-next and posted to a list, then the > > only content is the "Days in linux-next" histogram, which is not that long > > and is useful information IMHO. > > > > It would be nice if you could trim the tail of the histogram below the > > last populated row, that would make it more concise. > > Makes sense, I'll do that. > > > For fixes pulls it is sometimes legitimate for commits not to have been > > in linux-next. But I think it's still good for the bot to highlight > > those, ideally fixes that miss linux-next are either very urgent or > > minor. > > Right, and Linus said he's okay with those. This is not a "shame" list > but rather "look a little closer" list. Ok, that makes me feel better about this. I've got stuff that I hold back for weeks (or months), and others that I'm fine with sending the next day, once it's passed my CI. I'm going to try to be better about talking about which patches have risks (and why that risk is justified, else I wouldn't be sending it), or which patches look more involved but I've got reason to be confident about - that can get quite subtle. That's good for everyone down the line in terms of knowing what to expect. I wonder if also some of this is motivated by people concerned about things in bcachefs moving too fast, and running the risk of regressions? That's a justifiable concern, and priorities might be worth talking about a bit. I'm not currently seeing anything that makes me too concerned about regressions: users in general aren't complaining about regressions (previous pull request a user chimed in that he had been seeing less stability past few kernel releases, and he tried switching back to btrfs and the issues were still there) and my test dashboard is steadily improving. I do still have fairly critical (i.e. downtime causing) user reported issues coming in that are taking most of my time, although they're getting off into the weeds - one I've been working on the past few days was reported by a user with a ~20 drive filesystem where we're overflowing the maximum number of pointers in an extent, due to keeping too many cached copies around, and his filesystem goes emergency read-only. And there still seems to be something not-quite-right with snapshots and unlinked inodes, possibly a repair issue. Test dashboard still has a long ways to go before it's anywhere near as clean as I want (and it needs to be, so that I can easily spot regressions), but number of test failures have been steadily dropping and the results are getting more consistent, and none of the test failures there are scary ones that need to be jumped on. https://evilpiepirate.org/~testdashboard/ci?user=kmo&branch=bcachefs-testing (We were at 150-160 failures per full run a few weeks ago, now 100-110. The full runs also run fstests in 8 different configurations, so lots of duplicates). There have been a lot of new syzbot reports recently, and some of those do look more concerning. I don't think this indicates regressions - this looks to be like syzbot getting better with its code-coverage-guided testing at finding interesting codepaths, and for the concerning ones I don't think the code changed in the right timeframe for it to be a regressions. I think several of the recent syzbot reports are all due to the same bug, where it looks like we're not going read-only correctly and interior btree updates are still happening - I suspect that's been there for ages and an assert I recently added is making it more visible. I am still heavily in triage mode, and filesystem repair/recovery bugs take top priority. In general, my priorities are - critical user reported bugs first - failures from my automated test suite second (unless they're regressions) - syzbot last, because there's been basically zero overlap with syzbot bugs and user-affecting bugs so far, and because that's been an easy place for new people to jump in and help.