Re: [RFC v2 4/5] vfio-ccw: Don't call cp_free if we are processing a channel program

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 9 Jul 2019 17:27:47 -0400
Farhan Ali <alifm@xxxxxxxxxxxxx> wrote:

> 
> 
> On 07/09/2019 10:21 AM, Halil Pasic wrote:
> > On Tue, 9 Jul 2019 09:46:51 -0400
> > Farhan Ali <alifm@xxxxxxxxxxxxx> wrote:
> > 
> >>
> >>
> >> On 07/09/2019 06:16 AM, Cornelia Huck wrote:
> >>> On Mon,  8 Jul 2019 16:10:37 -0400
> >>> Farhan Ali <alifm@xxxxxxxxxxxxx> wrote:
> >>>
> >>>> There is a small window where it's possible that we could be working
> >>>> on an interrupt (queued in the workqueue) and setting up a channel
> >>>> program (i.e allocating memory, pinning pages, translating address).
> >>>> This can lead to allocating and freeing the channel program at the
> >>>> same time and can cause memory corruption.
> >>>>
> >>>> Let's not call cp_free if we are currently processing a channel program.
> >>>> The only way we know for sure that we don't have a thread setting
> >>>> up a channel program is when the state is set to VFIO_CCW_STATE_CP_PENDING.
> >>>
> >>> Can we pinpoint a commit that introduced this bug, or has it been there
> >>> since the beginning?
> >>>
> >>
> >> I think the problem was always there.
> >>
> > 
> > I think it became relevant with the async stuff. Because after the async
> > stuff was added we start getting solicited interrupts that are not about
> > channel program is done. At least this is how I remember the discussion.
> > 

You seem to have ignored this comment. BTW wasn't the cp->is_initialized
make 'Make it safe to call the cp accessors in any case, so we can call
them unconditionally.'?

@Connie: Your opinion as the author of that patch and of the cited
sentence?

> >>>>
> >>>> Signed-off-by: Farhan Ali <alifm@xxxxxxxxxxxxx>
> >>>> ---
> >>>>    drivers/s390/cio/vfio_ccw_drv.c | 2 +-
> >>>>    1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
> >>>> index 4e3a903..0357165 100644
> >>>> --- a/drivers/s390/cio/vfio_ccw_drv.c
> >>>> +++ b/drivers/s390/cio/vfio_ccw_drv.c
> >>>> @@ -92,7 +92,7 @@ static void vfio_ccw_sch_io_todo(struct work_struct *work)
> >>>>    		     (SCSW_ACTL_DEVACT | SCSW_ACTL_SCHACT));
> >>>>    	if (scsw_is_solicited(&irb->scsw)) {
> >>>>    		cp_update_scsw(&private->cp, &irb->scsw);
> >>>> -		if (is_final)
> >>>> +		if (is_final && private->state == VFIO_CCW_STATE_CP_PENDING)
> > 
> > Ain't private->state potentially used by multiple threads of execution?
> 
> yes
> 
> One of the paths I can think of is a machine check from the host which 
> will ultimately call vfio_ccw_sch_event callback which could set state 
> to NOT_OPER or IDLE.
> 
> > Do we need to use atomic operations or external synchronization to avoid
> > this being another gamble? Or am I missing something?
> 
> I think we probably should think about atomic operations for 
> synchronizing the state (and it could be a separate add on patch?).
> 
> But for preventing 2 threads from stomping on the cp the check should be 
> enough, unless I am missing something?
> 

Usually programming languages don't like incorrectly synchronized
programs. One tends to end up in undefined behavior land -- form language
perspective. That doesn't actually mean you are bound to see strange
stuff. With implementation spec + ABI spec + platform/architecture
spec one may end up with things being well defined. But it that is a much
deeper rabbit hole.

The nice thing about condition state == VFIO_CCW_STATE_CP_PENDING is
that it can tolerate stale state values. The bad case at hand
(you free but you should not) would be we see a stale
VFIO_CCW_STATE_CP_PENDING but we are actually
VFIO_CCW_STATE_CP_PROCESSING. That is pretty difficult to imagine
because one can enter VFIO_CCW_STATE_CP_PROCESSING only form
VFIO_CCW_STATE_CP_PENDING afair. On s390x torn reads/writes (i.e.
observing something that ain't either the old nor the new value) on an
int shouldn't be a concern.

The other bad case (where you don't free albeit you should) looks a
bit trickier.

I'm not a fan of keeping races around without good reasons. And I don't
see good reasons here. I'm no fan of needlessly complicated solutions
either.

But seems, at least with my beliefs about races, I'm the oddball
here. 

Regards,
Halil

> > 
> >>>>    			cp_free(&private->cp);
> >>>>    	}
> >>>>    	mutex_lock(&private->io_mutex);
> >>>
> >>> Reviewed-by: Cornelia Huck <cohuck@xxxxxxxxxx>
> >>>
> >>>
> >> Thanks for reviewing.
> >>
> >> Thanks
> >> Farhan
> > 
> > 




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux