On Mon, Aug 5, 2013 at 11:33 PM, Jesse Barnes <jbarnes@xxxxxxxxxxxxxxxx> wrote: > On Sun, 4 Aug 2013 22:17:47 +0200 > Daniel Vetter <daniel@xxxxxxxx> wrote: >> Imo the "unpredictable upstream" vs. "high quality kernel support in >> upstream" is a false dichotomy. Afaics the "unpredictability" is _because_ >> I am not willing to compromise on decent quality. I still claim that >> upstreaming is a fairly predictable thing (whithin some bounds of how well >> some tasks can be estimated up-front without doing some research or >> prototyping), and the blocker here is our mediocre project tracking. > > Well, I definitely disagree here. With our current (and recent past) > processes, we've generally ended up with lots of hw support landing > well after parts start shipping, and the quality hasn't been high (in > terms of user reported bugs) despite all the delay. So while our code > might look pretty, the fact is that it's late, and has hard to debug > low level bugs (RC6, semaphores, etc). > > <rant> > It's fairly easy to add support for hardware well after it ships, and > in a substandard way (e.g. hard power features disabled because we > can't figure them out because the hw debug folks have moved on). If we > want to keep doing that, fine, but I'd really like us to do better and > catch the hard bugs *before* hw ships, and make sure it's solid and > complete *before* users get it. But maybe that's just me. Maybe > treating our driver like any other RE or "best effort" Linux driver is > the right way to go. If so, fine, let's just not change anything. > </rant> The only thing I read here, both in the paragraph above and in the rant is that we suck. I agree. My opinion is that this is because we've started late, had too few resources and didn't seriously estimate how much work is actually involved to enable something for real. The only reason I could distill from the above two paragraphs among the ranting for way we are so much late is "So while our code might look pretty, it's late and buggy". That's imo a farily shallow stab at preceived bikesheds, but not a useful angle to improve process. Now I agree that I uphold a fairly high quality standard for upstreaming, but not an unreasonable one: - drm/i915 transformed from the undisputed shittiest driver in the kernel to one that mostly just works, while picking up development pace. So I don't think I'm fully wrong on insisting on this level of quality. - we do ship the driver essentially continously, which means we can implement features only by small refactoring steps. That clearly involves more work than just stitching something together for a product. I welcome discussing whether I impose t0o high standards, but that needs to be supplied with examples and solid reasons. "It's just too hard" without more context isn't one, since yes, the work we pull off here actually is hard. Also note that Chris&me still bear the brute of fixing the random fallout all over (it's getting better). So if any proposed changes involves me blowing through even more time to track down issues I'm strongly not in favour. Same holds for Chris often-heard comment that a patch needs an improved commit message or a comment somewhere. Yes it's annoying that you need to resend it (this often bugs me myself) just to paint the bikeshed a bit different. But imo Chris is pretty much throughout spot-on with his requests and a high-quality git history has, in my experience at least, been extremely valueable to track down the really ugly issues and legalese around all the established precendence. >> My approach here has been to be a royal jerk about test coverage for new >> features and blocking stuff if a regression isn't tackled in time. People >> scream all around, but it seems to work and we're imo getting to a "farly >> decent regression handling" point. I also try to push for enabling >> features across platforms (if the hw should work the same way) in the name >> of increased test coverage. That one seems to be less effective (e.g. fbc >> for hsw only ...). > > But code that isn't upstream *WON'T BE TESTED* reasonably. So if > you're waiting for all tests to be written before going upstream, all > you're doing is delaying the bug reports that will inevitably come in, > both from new test programs and from general usage. On top of that, if > someone is trying to refactor at the same time, things just become a > mess with all sorts of regressions introduced that weren't an issue > with the original patchset... QA on my trees and the igt testcoverage I demand for new features is to catch regressions once something is merged. We've managed to break code in less than a day since it's merged on multiple occasions, so this is very real and just part of the quality standard I impose. Furthermore I don't want that a new feature regresses overall stability of our driver. And since that quality is increasing rather decently I ask for more testcases to exercise cornercases to make sure they're all covered. This is very much orthogonal to doing review and just one more puzzle to ensure we don't go back to the neat old days of shipping half-baked crap. Note that nowadays QA is catching a lot of the regressions even before the patches land in Dave's tree (sometimes there's the occasional brown paper bag event though, but in each such case I analysis the failure mode and work to prevent it in the future). And imo that's squarely due to much improved test coverage and the rigid test coverage requirements for new feautures I impose. And of course the overall improve QA process flow with much quicker regression turnaround times also greatly helps here. Now I agree (and I think I've mentioned this a bunch of times in this thread already) that this leads to a pain for developers. I see two main issues, both are (slowly) improving: - Testing patch series for regressions before merging. QA just set up the developer patch test system, and despite that it's still rather limited Ben seems to be fairly happy with where it's going. So I think we're on track to improve this and avoid the need for developers to have a private lab like Chris and I essentially have. - Rebase hell due to ongoing other work. Thus far I've only tried to help here by rechecking/delaying refactoring patches while big features are pending. I think we need to try new approaches here and imo better planing should help. E.g. the initial modeset refactor was way too big and a monolithic junk that I've just wrestled in by exorting r-b tags from you. In contrast the pipe config rework was about equally big, but at any given time only about 30-50 patches where outstanding (in extreme cases), and mutliple people contributed different parts of the overall beast. Of course that means that occasional, for really big stuff, we need to plan to write a first proof of concept as a landmark where we need to go to, which pretty much will be thrown away completely. One meta-comment on top of the actual discussion: I really appreciate critique and I've grown a good maintainer-skin to also deal with really harsh critique. But I prefer less ranting and more concrete examples where I've botched the job (there are plentiful to pick from imo) and concrete suggestion for how to improve our overall process. I think these process woes are painful for everyone and due to our fast growth we're constantly pushing into new levels of ugly, but imo the way to go forward is by small (sometimes positively tiny), but continous adjustements and improvements. I think we both agree where we'd like to be, but at least for me in the day-to-day fight in the trenches the rosy picture 200 miles away doesn't really help. Maybe I'm too delusional and sarcastic that way ;-) Cheers, Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx