Re: [RFC PATCH 0/9] drm/i915/spi: discrete graphics internal spi

"Winkler, Tomas" <tomas.winkler@xxxxxxxxx> · Sun, 21 Feb 2021 07:10:57 +0000

> 
> 
> >
> > )On Tue, Feb 16, 2021 at 7:26 PM Tomas Winkler
> > <tomas.winkler@xxxxxxxxx>
> > wrote:
> > > Because the graphic card may undergo reset at any time and basically
> > > hot unplug all its child devices, this series also provides a fix to
> > > the mtd framework to make the reset graceful.
> >
> > Well, just because MTD does not work as you expect, it is not broken.
> > :-)
> I'm not saying it's broken by design it just didn't fit this use case.
> >
> > In your case i915_spi_remove() blindly removes the MTD, this is not
> allowed.
> > You may remove the MTD only if there are no more users.
> 
> I'm not sure it's good idea to stall the removal on user space.
> This is just asking for a deadlock as user space is not getting what it needs and
> may stall I think it's better the user space will fail gracefully the hw is not
> accessible in that stage anyway.
> >
> > The current model in MTD is that the driver is in charge of all life
> > cycle management.
> > Using ->_get_device() and ->_put_device() a driver can implement
> > refcounting and deny new users if the MTD is about to disappear.
> 
> Please note that this use case you are describing is still valid, I haven't
> removed _get_device() _put_device() handlers, You can still stall the
> removal of mtd, If this is not that way it's a bug
> 
> >
> > In the upcoming MUSE driver that mechanism is used too.
> > MUSE allows to implement a MTD in userspace. So the FUSE server can
> > disappear at
> > *any* time. Just like in your case. Even worse, it can be hostile.
> > In MUSE the MTD life time is tied to the FUSE connection object,
> > muse_mtd_get_device()
> > increments the FUSE connection refcount, and muse_mtd_put_device()
> > decrements it.
> > That means if the FUSE server disappears all of a sudden but the MTD
> > still has users, the MTD will stay. But in this state no new
> > references are allowed and all MTD operations of existing users will fail
> with -ENOTCONN (via FUSE).
> > As soon the last user is gone (can be userspace via /dev/mtd* or a
> > in-kernel user such as UBIFS), the MTD will be removed.
> 
> But in our case whole i915 is taken hostage, it cannot reset because of
> misbehaving user space.
> 
> > For the full details, please see:
> > https://git.kernel.org/pub/scm/linux/kernel/git/rw/misc.git/tree/fs/fu
> > se/m
> > use.c?h=muse_v3#n1034
> >
> > Is in your case *really* not possible to do it that way?
> 
> Maybe it's possible but I don't think it's good to stall i915 removal. Also It's
> very easily to crash the kernel.
> I've posted a sniped to the mailing list that tried to do that, the kernel still has
> crashed. Can you looked at?
> 
> > On the other hand, your last patch moves some part of the life cycle
> > management into MTD core.
> > The MTD will stay as long it has users.
> > But that's only one part. The driver is still in charge to make sure
> > that all operations fail immediately and that no new users arrive.
> 
> I think that case I would need to validate every HW access to make sure it's
> still valid.
> 
> > If we want to do all in MTD core we'd have to do it like SCSI disks.
> > That means having devices states such as SDEV_RUNNING, SDEV_CANCEL,
> > SDEV_OFFLINE, ....
> > That way the MTD could be shutdown gracefully, first no new users are
> > allowed, then ongoing operations will be cancelled, next all operation
> > will fail with -EIO or such, then the device is being removed from
> > sysfs and finally if the last user is gone, the MTD can be removed.
> 
> Isn't that already that way? You cannot open new handler. That I would need
> more of your insights.
> >
> > I'm not sure whether we want to take that path.

Hi Richard is there any way we can try to unclutter this ?

Thanks
Tomas

_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx