Re: [RFC PATCH v3 0/3] vfio-ccw: Fix interrupt handling for HALT/CLEAR

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 6/29/20 10:56 AM, Cornelia Huck wrote:
> On Wed, 17 Jun 2020 07:24:17 -0400
> Eric Farman <farman@xxxxxxxxxxxxx> wrote:
> 
>> On 6/16/20 3:50 PM, Eric Farman wrote:
>>> Let's continue our discussion of the handling of vfio-ccw interrupts.
>>>
>>> The initial fix [1] relied upon the interrupt path's examination of the
>>> FSM state, and freeing all resources if it were CP_PENDING. But the
>>> interface used by HALT/CLEAR SUBCHANNEL doesn't affect the FSM state.
>>> Consider this sequence:
>>>
>>>     CPU 1                           CPU 2
>>>     CLEAR (state=IDLE/no change)
>>>                                     START [2]
>>>     INTERRUPT (set state=IDLE)
>>>                                     INTERRUPT (set state=IDLE)
>>>
>>> This translates to a couple of possible scenarios:
>>>
>>>  A) The START gets a cc2 because of the outstanding CLEAR, -EBUSY is
>>>     returned, resources are freed, and state remains IDLE
>>>  B) The START gets a cc0 because the CLEAR has already presented an
>>>     interrupt, and state is set to CP_PENDING
>>>
>>> If the START gets a cc0 before the CLEAR INTERRUPT (stacked onto a
>>> workqueue by the IRQ context) gets a chance to run, then the INTERRUPT
>>> will release the channel program memory prematurely. If the two
>>> operations run concurrently, then the FSM state set to CP_PROCESSING
>>> will prevent the cp_free() from being invoked. But the io_mutex
>>> boundary on that path will pause itself until the START completes,
>>> and then allow the FSM to be reset to IDLE without considering the
>>> outstanding START. Neither scenario would be considered good.
>>>
>>> Having said all of that, in v2 Conny suggested [3] the following:
>>>   
>>>> - Detach the cp from the subchannel (or better, remove the 1:1
>>>>   relationship). By that I mean building the cp as a separately
>>>>   allocated structure (maybe embedding a kref, but that might not be
>>>>   needed), and appending it to a list after SSCH with cc=0. Discard it
>>>>   if cc!=0.
>>>> - Remove the CP_PENDING state. The state is either IDLE after any
>>>>   successful SSCH/HSCH/CSCH, or a new state in that case. But no
>>>>   special state for SSCH.
>>>> - A successful CSCH removes the first queued request, if any.
>>>> - A final interrupt removes the first queued request, if any.  
>>>
>>> What I have implemented here is basically this, with a few changes:
>>>
>>>  - I don't queue cp's. Since there should only be one START in process
>>>    at a time, and HALT/CLEAR doesn't build a cp, I didn't see a pressing
>>>    need to introduce that complexity.
>>>  - Furthermore, while I initially made a separately allocated cp, adding
>>>    an alloc for a cp on each I/O AND moving the guest_cp alloc from the
>>>    probe path to the I/O path seems excessive. So I implemented a
>>>    "started" flag to the cp, set after a cc0 from the START, and examine
>>>    that on the interrupt path to determine whether cp_free() is needed.  
>>
>> FYI... After a day or two of running, I sprung a kernel debug oops for
>> list corruption in ccwchain_free(). I'm going to blame this piece, since
>> it was the last thing I changed and I hadn't come across any such damage
>> since v2. So either "started" is a bad idea, or a broken one. Or both. :)
> 
> Have you come to any conclusion wrt 'started'? Not wanting to generate
> stress, just asking :)
> 

I've talked myself out of it, and gone back to your original proposal of
a separately allocated cp. (Still no queuing.) Too early to pass
judgement though.

Yesterday, when running with a cp_free() call after a CSCH, I was
getting all sorts of errors very early on, so at the moment I've pulled
that back out again. If it looks good in this form, I'll put that as a
separate patch and write up some doc for a discussion on that point.

Attachment: signature.asc
Description: OpenPGP digital signature


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Kernel Development]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite Info]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Linux Media]     [Device Mapper]

  Powered by Linux