On Tue, 2022-07-19 at 13:49 -0400, Eric Farman wrote: > On Tue, 2022-07-19 at 09:26 -0600, Alex Williamson wrote: > > On Tue, 19 Jul 2022 16:49:28 +0200 > > Christoph Hellwig <hch@xxxxxx> wrote: > > > > > On Mon, Jul 18, 2022 at 10:01:40PM -0400, Eric Farman wrote: > > > > I'll get the problem with struct subchannel [1] sorted out in > > > > the > > > > next > > > > couple of days. This series breaks vfio-ccw in its current form > > > > (see > > > > reply to patch 14), but even with that addressed the placement > > > > of > > > > all > > > > these other mdev structs needs to be handled differently. > > > > > > Alex, any preference if I should just fix the number instances > > > checking > > > with either an incremental patch or a resend, or wait for this > > > ccw > > > rework? > > > > Since it's the last patch, let's at least just respin that patch > > rather > > than break and fix. I'd like to make sure Eric is ok to shift > > around > > structures as a follow-up or make a proposal how this series should > > change though. > > I'd hoped to have that proposal today, but I don't have much > confidence > in it yet as this series (with the fix on the last patch) is still > crashing my system. Will get something out as soon as I'm able. The solution I envision thus far does two things: - Move the struct mdev_parent and its friends out of struct subchannel, and into struct vfio_ccw_private. This struct is allocated just prior to the call to mdev_register_device/_parent, and released with the mdev_unregister. It's also a device-specific struct linked from the device-agnostic subchannel. - Add a kref to struct vfio_ccw_private. The mdev_parent currently has one, which is now unnecessary since it's embedded in another struct, but it leaves vfio_ccw_private rather racy. I suspect the second item (or something similar) is needed anyway, because Alex' tree + this series crashes frequently in (usually) mdev_remove. I haven't found an explanation for how we get in this state, but admittedly didn't spent a lot of time on them since the proposed changes to struct subchannel are a non-starter. The other crashes were always in something that's almost certainly a victim of something else, like kmalloc-related stuff in net/skbuff. With the above, the crashes out of the vfio-ccw stack disappear, and things work a bit better. But those random kmalloc-related crashes persist. I guess I'll pick those up tomorrow. Eric