On 30 July 2015 at 15:18, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > On Wed, Jul 29, 2015 at 6:39 PM, Theodore Ts'o <tytso@xxxxxxx> wrote: >> >> It's here: https://goo.gl/photos/xHjn2Z97JQEw6k2C9 > > You didn't catch enough of the code line to decode the code, but it's > early enough in drm_crtc_index() (just five bytes in) that it's almost > certainly the very first dereference, so it's almost guaranteed to be > that > > crtc->dev > > access as part of list_for_each_entry(), with crtc being NULL. And > yes, "->dev" is the very first field, so the offset is zero too (while > the "->mode_config" list access would not be at offset zero). > > And it looks like it is called from drm_atomic_helper_check_modeset(): > the reason it has a question mark in the backtrace is because the > fault happens before the stack frame has even been set up. > > There are multiple calls to "drm_crtc_index()" from that function, I > can't tell which one it is. Looking at the code generation I get, I > think it's because update_connector_routing() gets inlined, and that > one does several calls. Most of them look like this: > > if (connector->state->crtc) { > idx = drm_crtc_index(connector->state->crtc); > > ie they check that the crtc is non-NULL, but that last one does not: > > connector_state->best_encoder = new_encoder; > idx = drm_crtc_index(connector_state->crtc); > > crtc_state = state->crtc_states[idx]; > crtc_state->mode_changed = true; > > and I suspect the fix might be something like the attached. Totally > untested. Ted? > > This whole "atomic modeset" series has been one royal fuck-up, guys. > We've had too many of these kinds of crap issues. It hasn't been that bad, on a scale of 1 to MD eats my raid array, I'd say we are barely at a 5. There have been a lot of small and seemingly easily fixed teething problems, essentially rewriting the DRM API to provide a new userspace API and internal interface, porting some drivers partly to the new interface, while trying to maintain the old ABI/API on top seamlessly was always going to be an impossible task. It was never going to magically all just work in -next and land in your tree fully formed smelling of lavender and elderberries. This is a massive undertaking, and doing it over a few kernels was the only possible way it could ever land. I think the biggest problem we've had is the QA team at Intel got reorganised or something right when they really needed to be doing testing on this stuff, so what was sitting in -next never got as much testing as it had previously, and you can see that in the types of cases that are getting through. I think the other thing we can learn is that when Android forks the kernel we should just say this shit is too hard, let Google go and create a new API and a complete set of graphics drivers and deal with it in 10 years, because that was seriously the only other option. So yes it's a pity other kernel developers are seeing our fallout, but I've experienced lots of other kernel developers fall out over the years, and generally the idea is to get this stuff fixed to a reasonable state before you release a final kernel. Note I'm not personally involved in the development for atomic modesetting at all, I'm running the kernels with it where and when I can, and I trust the developers who work on it are doing as much as they can to make it work. That said hopefully Daniel can find a bag of fucks to debug and write a proper patch, instead of rage quitting the universe, and just git reset --hard v4.0 drivers/gpu/drm/i915.. Dave. _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel