On 06/27/2019 05:14 AM, Cornelia Huck wrote:
On Mon, 24 Jun 2019 11:24:16 -0400
Farhan Ali <alifm@xxxxxxxxxxxxx> wrote:
On 06/24/2019 11:09 AM, Cornelia Huck wrote:
On Mon, 24 Jun 2019 10:44:17 -0400
Farhan Ali <alifm@xxxxxxxxxxxxx> wrote:
But even if we don't remove the cp_free from vfio_ccw_sch_io_todo, I am
not sure if your suggestion will fix the problem. The problem here is
that we can call vfio_ccw_sch_io_todo (for a clear or halt interrupt) at
the same time we are handling an ssch request. So depending on the order
of the operations we could still end up calling cp_free from both from
threads (i refer to the threads I mentioned in response to Eric's
earlier email).
What I don't see is why this is a problem with ->initialized; wasn't
the problem that we misinterpreted an interrupt for csch as one for a
not-yet-issued ssch?
It's the order in which we do things, which could cause the problem.
Since we queue interrupt handling in the workqueue, we could delay
processing the csch interrupt. During this delay if ssch comes through,
we might have already set ->initialized to true.
So when we get around to handling the interrupt in io_todo, we would go
ahead and call cp_free. This would cause the problem of freeing the
ccwchain list while we might be adding to it.
Another thing that concerns me is that vfio-ccw can also issue csch/hsch
in the quiesce path, independently of what the guest issues. So in that
case we could have a similar scenario to processing an ssch request and
issuing halt/clear in parallel. But maybe I am being paranoid :)
I think the root problem is really trying to clear a cp while another
thread is trying to set it up. Should we maybe use something like rcu?
Yes, this is the root problem. I am not too familiar with rcu locking,
but what would be the benefit over a traditional mutex?
I don't quite remember what I had been envisioning at the time (sorry,
the heat seems to make my brain a bit slushy :/), but I think we might
have two copies of the cp and use an rcu-ed pointer in the private
structure to point to one of the copies. If we make sure we've
synchronized on the pointer at interrupt time, we should be able to
free the old one in _todo and act on the new on when doing ssch. And
yes, I realize that this is awfully vague :)
Sorry for the delayed response. I was trying out few ideas, and I think
the simplest one for me that worked and that makes sense is to
explicitly add the check to see if the state == CP_PENDING when trying
to free the cp (as mentioned by Halil in a separate thread).
When we are in the CP_PENDING state then we know for sure that we have a
currently allocated cp and no other thread is working on it. So in the
interrupt context, it should be okay to free cp.
I have prototyped with the mutex, but the code becomes too hairy. I
looked into the rcu api and from what I understand about rcu it would
provide advantage if we more readers than updaters. But in our case we
really have 2 updaters, updating the cp at the same time.
In the meantime I also have some minor fixes while going over the code
again :). I will post a v2 soon for review.
Thanks
Farhan