Re: [PATCH v6 0/3] vfio-ccw: Fix interrupt handling for HALT/CLEAR

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 11 May 2021 21:56:28 +0200
Eric Farman <farman@xxxxxxxxxxxxx> wrote:

> Hi Conny, Matt, Halil,
> 
> Here's one (last?) update to my proposal for handling the collision
> between interrupts for START SUBCHANNEL and HALT/CLEAR SUBCHANNEL.
> 
> Only change here is to include Conny's suggestions on patch 3.
> 
> Thanks,

I believe these changes are beneficial, although I don't understand
everything about them. In that sense I'm happy with the these getting
merged.

Let me also spend some words answering the unasked question, what I'm
not understanding about these.

Not understanding how the problem stated in the cover letter of v4 is
actually resolved is certainly the most important one. Let me cite
the relevant part of it (your cover letter already contains a link to
the full version).

"""

	CPU 1			CPU 2
 1	CLEAR SUBCHANNEL
 2	fsm_irq()
 3				START SUBCHANNEL
 4	vfio_ccw_sch_io_todo()
 5				fsm_irq()
 6				vfio_ccw_sch_io_todo()

>From the channel subsystem's point of view the CLEAR SUBCHANNEL (step 1)
is complete once step 2 is called, as the Interrupt Response Block (IRB)
has been presented and the TEST SUBCHANNEL was driven by the cio layer.
Thus, the START SUBCHANNEL (step 3) is submitted [1] and gets a cc=0 to
indicate the I/O was accepted. However, step 2 stacks the bulk of the
actual work onto a workqueue for when the subchannel lock is NOT held,
and is unqueued at step 4. That code misidentifies the data in the IRB
as being associated with the newly active I/O, and may release memory
that is actively in use by the channel subsystem and/or device. Eww.
"""

The last sentence clearly states "may release memory that is actively
used by ... the device", and I understood it refers to the invocation
of cp_free() from vfio_ccw_sch_io_todo(). Patch 3 of this series does
not change the conditions under which cp_free() is called.

Looking at the cited diagram, since patch 3 changes things in
vfio_ccw_sch_io_todo() it probably ain't affecting steps 1-3 and
I understood the description so that bad free happens in step 4.

My guess is that your change from patch 3 somehow via the fsm prevents
the SSCH on CPU 2 (using the diagram) from being executed  if it actually
happens to be after vfio_ccw_sch_io_todo(). And patch 1 is supposed to
prevent the SSCH on CPU2 from being executed in the depicted case because
if there is a cp to free, then we would bail out form if we see it
while processing the new IO request.

In any case, I don't want to hold this up any further.

Regards,
Halil




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux