Re: [RFC v1 1/1] vfio-ccw: Don't call cp_free if we are processing a channel program

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 06/24/2019 11:09 AM, Cornelia Huck wrote:
On Mon, 24 Jun 2019 10:44:17 -0400
Farhan Ali <alifm@xxxxxxxxxxxxx> wrote:

On 06/24/2019 08:07 AM, Cornelia Huck wrote:
On Mon, 24 Jun 2019 13:46:22 +0200
Cornelia Huck <cohuck@xxxxxxxxxx> wrote:
On Mon, 24 Jun 2019 12:05:14 +0200
Cornelia Huck <cohuck@xxxxxxxxxx> wrote:
On Mon, 24 Jun 2019 11:42:31 +0200
Cornelia Huck <cohuck@xxxxxxxxxx> wrote:
On Fri, 21 Jun 2019 14:34:10 -0400
Farhan Ali <alifm@xxxxxxxxxxxxx> wrote:
On 06/21/2019 01:40 PM, Eric Farman wrote:


On 6/21/19 10:17 AM, Farhan Ali wrote:


On 06/20/2019 04:27 PM, Eric Farman wrote:


On 6/20/19 3:40 PM, Farhan Ali wrote:
diff --git a/drivers/s390/cio/vfio_ccw_drv.c
b/drivers/s390/cio/vfio_ccw_drv.c
index 66a66ac..61ece3f 100644
--- a/drivers/s390/cio/vfio_ccw_drv.c
+++ b/drivers/s390/cio/vfio_ccw_drv.c
@@ -88,7 +88,7 @@ static void vfio_ccw_sch_io_todo(struct work_struct
*work)
                  (SCSW_ACTL_DEVACT | SCSW_ACTL_SCHACT));
         if (scsw_is_solicited(&irb->scsw)) {
             cp_update_scsw(&private->cp, &irb->scsw);

As I alluded earlier, do we know this irb is for this cp?  If no, what
does this function end up putting in the scsw?

Yes, I think this also needs to check whether we have at least a prior
start function around. (We use the orb provided by the guest; maybe we
should check if that intparm is set in the irb?)

Hrm; not so easy as we always set the intparm to the address of the
subchannel structure...

Maybe check if we have have one of the conditions of the large table
16-6 and correlate to the ccw address? Or is it enough to check the
function control? (Don't remember when the hardware resets it.)

Nope, we cannot look at the function control, as csch clears any set
start function bit :( (see "Function Control", pg 16-13)

I think this problem mostly boils down to "csch clears pending status;
therefore, we may only get one interrupt, even though there had been a
start function going on". If we only go with what the hardware gives
us, I don't see a way to distinguish "clear with a prior start" from
"clear only". Maybe we want to track an "issued" status in the cp?

Sorry for replying to myself again :), but maybe we should simply call
cp_free() if we got cc 0 from a csch? Any start function has been
terminated at the subchannel during successful execution of csch, and
cp_free does nothing if !cp->initialized, so we should hopefully be
safe there as well. We can then add a check for the start function in
the function control in the check above and should be fine, I think.


So you mean not call cp_free in vfio_ccw_sch_io_todo, and instead call
cp_free for a cc=0 for csch (and hsch) ?

Won't we end up with memory leak for a successful for ssch then?

No; both:

- free if cc=0 for csch (as this clears the status; hsch doesn't)
- free in _todo if the start function is set in the irb and the status
   is final


But even if we don't remove the cp_free from vfio_ccw_sch_io_todo, I am
not sure if your suggestion will fix the problem. The problem here is
that we can call vfio_ccw_sch_io_todo (for a clear or halt interrupt) at
the same time we are handling an ssch request. So depending on the order
of the operations we could still end up calling cp_free from both from
threads (i refer to the threads I mentioned in response to Eric's
earlier email).

What I don't see is why this is a problem with ->initialized; wasn't
the problem that we misinterpreted an interrupt for csch as one for a
not-yet-issued ssch?


It's the order in which we do things, which could cause the problem. Since we queue interrupt handling in the workqueue, we could delay processing the csch interrupt. During this delay if ssch comes through, we might have already set ->initialized to true.

So when we get around to handling the interrupt in io_todo, we would go ahead and call cp_free. This would cause the problem of freeing the ccwchain list while we might be adding to it.


Another thing that concerns me is that vfio-ccw can also issue csch/hsch
in the quiesce path, independently of what the guest issues. So in that
case we could have a similar scenario to processing an ssch request and
issuing halt/clear in parallel. But maybe I am being paranoid :)

I think the root problem is really trying to clear a cp while another
thread is trying to set it up. Should we maybe use something like rcu?



Yes, this is the root problem. I am not too familiar with rcu locking, but what would be the benefit over a traditional mutex?

Thanks
Farhan




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux