On Tue, 21 Feb 2017 15:36:23 +0800 Dong Jia Shi <bjsdjshi@xxxxxxxxxxxxxxxxxx> wrote: > * Cornelia Huck <cornelia.huck@xxxxxxxxxx> [2017-02-20 19:31:13 +0100]: > > > On Fri, 17 Feb 2017 09:29:28 +0100 > > Dong Jia Shi <bjsdjshi@xxxxxxxxxxxxxxxxxx> wrote: > > > > > To make vfio support subchannel devices, we need a css driver for > > > the vfio subchannels. This patch adds a basic vfio-ccw subchannel > > > driver for this purpose. > > > > > > To enable VFIO for vfio-ccw, enable S390_CCW_IOMMU config option > > > and configure VFIO as required. > > > > > > Signed-off-by: Dong Jia Shi <bjsdjshi@xxxxxxxxxxxxxxxxxx> > > > Acked-by: Pierre Morel <pmorel@xxxxxxxxxxxxxxxxxx> > > > --- > > > arch/s390/Kconfig | 10 ++ > > > arch/s390/include/asm/isc.h | 1 + > > > drivers/iommu/Kconfig | 8 ++ > > > drivers/s390/cio/Makefile | 3 + > > > drivers/s390/cio/vfio_ccw_drv.c | 262 ++++++++++++++++++++++++++++++++++++ > > > drivers/s390/cio/vfio_ccw_private.h | 25 ++++ > > > 6 files changed, 309 insertions(+) > > > create mode 100644 drivers/s390/cio/vfio_ccw_drv.c > > > create mode 100644 drivers/s390/cio/vfio_ccw_private.h > > > diff --git a/arch/s390/include/asm/isc.h b/arch/s390/include/asm/isc.h > > > index 68d7d68..8a0b721 100644 > > > --- a/arch/s390/include/asm/isc.h > > > +++ b/arch/s390/include/asm/isc.h > > > @@ -16,6 +16,7 @@ > > > #define CONSOLE_ISC 1 /* console I/O subchannel */ > > > #define EADM_SCH_ISC 4 /* EADM subchannels */ > > > #define CHSC_SCH_ISC 7 /* CHSC subchannels */ > > > +#define VFIO_CCW_ISC IO_SCH_ISC /* VFIO-CCW I/O subchannels */ > > > > This is OK for now, I guess; but do we want to have the isc > > configurable in the long run? I.e., if a host wants to run its own I/O > > devices at a different priority than the devices it passes to a guest? > > > I think we can keep this as the default value, and provide a driver > param to customize the ISC value in the future once we need this. I put > this on my LATER list, or I do it in next version? This should a per-device knob, I think. It's ok to default to the normal I/O subchannel isc. We probably can add this at a later time when we're clear about the requirements. > > > > /* Adapter interrupts. */ > > > #define QDIO_AIRQ_ISC IO_SCH_ISC /* I/O subchannel in qdio mode */ > > > #define PCI_ISC 2 /* PCI I/O subchannels */ > > > > (...) > > > > > diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c > > > new file mode 100644 > > > index 0000000..b068207 > > > --- /dev/null > > > +++ b/drivers/s390/cio/vfio_ccw_drv.c > > > @@ -0,0 +1,262 @@ > > > +/* > > > + * VFIO based Physical Subchannel device driver > > > + * > > > + * Copyright IBM Corp. 2017 > > > + * > > > + * Author(s): Dong Jia Shi <bjsdjshi@xxxxxxxxxxxxxxxxxx> > > > + * Xiao Feng Ren <renxiaof@xxxxxxxxxxxxxxxxxx> > > > + */ > > > + > > > +#include <linux/module.h> > > > +#include <linux/init.h> > > > +#include <linux/device.h> > > > +#include <linux/slab.h> > > > + > > > +#include <asm/isc.h> > > > + > > > +#include "vfio_ccw_private.h" > > > + > > > +/* > > > + * Helpers > > > + */ > > > +static int vfio_ccw_sch_quiesce(struct subchannel *sch) > > > +{ > > > + struct vfio_ccw_private *private = dev_get_drvdata(&sch->dev); > > > + DECLARE_COMPLETION_ONSTACK(completion); > > > + int iretry, ret = 0; > > > + > > > + spin_lock_irq(sch->lock); > > > + if (!sch->schib.pmcw.ena) > > > + goto out_unlock; > > > + ret = cio_disable_subchannel(sch); > > > + if (ret != -EBUSY) > > > + goto out_unlock; > > > + > > > + do { > > > + iretry = 255; > > > + > > > + ret = cio_cancel_halt_clear(sch, &iretry); > > > + while (ret == -EBUSY) { > > > + /* > > > + * Flushing all I/O and wait the > > > > "Flush all I/O and wait for..." > > > Ok. > > > > + * cancel/halt/clear completion. > > > + */ > > > + private->completion = &completion; > > > + spin_unlock_irq(sch->lock); > > > + > > > + wait_for_completion(&completion); > > > > What happens for cancel? It won't generate an interrupt. > > > Right! How about using: > wait_for_completion_timeout(&completion, 3*HZ); > > (I stole '3*HZ' from ccw_device_kill_io.) That's likely a good place to steal from :) > > > > + > > > + spin_lock_irq(sch->lock); > > > + private->completion = NULL; > > > + ret = cio_cancel_halt_clear(sch, &iretry); > > > + }; > > > + > > > + ret = cio_disable_subchannel(sch); > > > + } while (ret == -EBUSY); > > > + > > > +out_unlock: > > > + spin_unlock_irq(sch->lock); > > > + return ret; > > > +} > > > +static int vfio_ccw_sch_probe(struct subchannel *sch) > > > +{ > > > + struct pmcw *pmcw = &sch->schib.pmcw; > > > + struct vfio_ccw_private *private; > > > + int ret; > > > + > > > + if (pmcw->qf) { > > > + dev_warn(&sch->dev, "vfio: ccw: do not support QDIO: %s\n", > > > > s/do/does/ > > > Ok. > > > > + dev_name(&sch->dev)); > > > + return -ENOTTY; > > > > Is -ENOTTY the right return code here? -EINVAL? > > > Ok. Think it again. -EINVAL makes more sense. It's like: > "hey, I know it's an I/O subchannel, but not the kind we support". The driver core treats -ENODEV/-ENXIO as "driver matched, but rejected the device". That's probably better, as we can't filter on device types when binding at the subchannel level. > > > > + } > > > + > > > + private = kzalloc(sizeof(*private), GFP_KERNEL | GFP_DMA); > > > + if (!private) > > > + return -ENOMEM; > > > + private->sch = sch; > > > + dev_set_drvdata(&sch->dev, private); > > > + > > > + spin_lock_irq(sch->lock); > > > + sch->isc = VFIO_CCW_ISC; > > > + ret = cio_enable_subchannel(sch, (u32)(unsigned long)sch); > > > + spin_unlock_irq(sch->lock); > > > + if (ret) > > > + goto out_free; > > > + > > > + ret = sysfs_create_group(&sch->dev.kobj, &vfio_subchannel_attr_group); > > > + if (ret) > > > + goto out_disable; > > > + > > > + return 0; > > > + > > > +out_disable: > > > + cio_disable_subchannel(sch); > > > +out_free: > > > + dev_set_drvdata(&sch->dev, NULL); > > > + kfree(private); > > > + return ret; > > > +} > > > + > > > > (...) > > > > > +/** > > > + * vfio_ccw_sch_event - process subchannel event > > > + * @sch: subchannel > > > + * @process: non-zero if function is called in process context > > > + * > > > + * An unspecified event occurred for this subchannel. Adjust data according > > > + * to the current operational state of the subchannel. Return zero when the > > > + * event has been handled sufficiently or -EAGAIN when this function should > > > + * be called again in process context. > > > + */ > > > +static int vfio_ccw_sch_event(struct subchannel *sch, int process) > > > +{ > > > + unsigned long flags; > > > + > > > + spin_lock_irqsave(sch->lock, flags); > > > + if (!device_is_registered(&sch->dev)) > > > + goto out_unlock; > > > + > > > + if (work_pending(&sch->todo_work)) > > > + goto out_unlock; > > > + > > > + if (cio_update_schib(sch)) { > > > + /* Not operational. */ > > > + css_sched_sch_todo(sch, SCH_TODO_UNREG); > > > + > > > + /* > > > + * TODO: > > > + * Probably we should send the machine check to the guest. > > > > Yes, we should do that later on. Will user space notice that the device > > is gone? (I think crw injection should be done by user space.) > > > Currently we lack this mechanism. I think there are many todos here. I > will investigate latter. Yes. We just need to keep that in mind for later. > > > > + */ > > > + goto out_unlock; > > > + } > > > + > > > +out_unlock: > > > + spin_unlock_irqrestore(sch->lock, flags); > > > + > > > + return 0; > > > +}