On 02/21/2019 05:03 AM, Cornelia Huck wrote:
On Thu, 21 Feb 2019 10:30:01 +0100
Cornelia Huck <cohuck@xxxxxxxxxx> wrote:
On Wed, 20 Feb 2019 22:02:10 -0500
Eric Farman <farman@xxxxxxxxxxxxx> wrote:
On 02/20/2019 05:35 PM, Eric Farman wrote:
On 02/20/2019 11:44 AM, Cornelia Huck wrote:
On Wed, 20 Feb 2019 11:25:46 -0500
Eric Farman <farman@xxxxxxxxxxxxx> wrote:
On 02/20/2019 07:44 AM, Cornelia Huck wrote:
On Wed, 20 Feb 2019 06:29:38 -0500
Eric Farman <farman@xxxxxxxxxxxxx> wrote:
On 02/20/2019 04:48 AM, Cornelia Huck wrote:
On Tue, 19 Feb 2019 21:49:07 -0500
Eric Farman <farman@xxxxxxxxxxxxx> wrote:
Hi Connie, Farhan,
On 02/04/2019 12:06 PM, Cornelia Huck wrote:
From: Farhan Ali <alifm@xxxxxxxxxxxxx>
When trying to calculate the length of a ccw chain, we assume
there are ccws after a TIC. This can lead to overcounting and
copying garbage data from guest memory.
Signed-off-by: Farhan Ali <alifm@xxxxxxxxxxxxx>
Message-Id:
<d63748c1f1b03147bcbf401596638627a5e35ef7.1548082107.git.alifm@xxxxxxxxxxxxx>
Reviewed-by: Halil Pasic <pasic@xxxxxxxxxxxxx>
Signed-off-by: Cornelia Huck <cohuck@xxxxxxxxxx>
---
drivers/s390/cio/vfio_ccw_cp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/s390/cio/vfio_ccw_cp.c
b/drivers/s390/cio/vfio_ccw_cp.c
index 70a006ba4d05..ba08fe137c2e 100644
--- a/drivers/s390/cio/vfio_ccw_cp.c
+++ b/drivers/s390/cio/vfio_ccw_cp.c
@@ -392,7 +392,7 @@ static int ccwchain_calc_length(u64 iova,
struct channel_program *cp)
return -EOPNOTSUPP;
}
- if ((!ccw_is_chain(ccw)) && (!ccw_is_tic(ccw)))
+ if (!ccw_is_chain(ccw))
OK, this function now returns the length of the chain excluding the
last tic.
Hm, not so sure about that anymore. I'm a bit tired, please apply salt
where needed to what I'm seeing.
I didn't get as far as I had hoped today, so I'm going to need to reset
to tomorrow, have coffee, and try again. But it does seem that with
this patch, we calculate the length of the chain up to and including the
TIC, and nothing beyond it.
That is, during the boot process we calculate a chain length of "3" for
a SEEK + SIDE + TIC that QEMU built.
Hm... so it looks like that code does what it says on the tin. But why
are we missing the second round, that should give us a chain with the
forth ccw? Are we missing a loop somewhere?
One thing where this is different now that we don't count the last READ
is that tic_target_chain_exists() won't return 1 if we tic to the READ,
as it has not been copied. So we execute code that presumably had not
been executed before...
Coming back to these later, when coffee has kicked in.
break;
ccw++;
Now, cp_init will not copy the last tic to the chain. When it then
looks for tics in that new chain, it won't find any, and stop copying.
(...)
I'm not going to claim to know exactly how this is failing,
because it's too late for coffee and reading IPL records is
difficult enough. But in the working case (without this patch),
cp_init() calls ccwchain_calc_length() and gets a count of four
CCWs (SEEK + SIDE + TIC
+ READ), which is broken up into two chains. One for the
SEEK/SIDE/TIC, and another for the READ. In the failing case
(with this patch), cp_init() only counts three CCWs, we never read
the IPL2 address (the second chain in the working case), and our
boot fails.
I think "we don't copy the tic to the chain where we search for tics"
fits those symptoms.
Indeed. Hrm...
(...)
See above. We may need two chains: One without the trailing tic, and
one to process to see where that tic points to... or rework the way to
follow the tic? Not sure.
I hate the idea of two chains, because how much storage do we end up
needing to consume? But maybe it's inevitable, or maybe I can think up
a way to rework it without.
Yeah, that sucks. If we really need to track tics separately (and I'm
not so sure, see above), there should be a better way for that...
(...)
I'm wondering whether we should keep this patch and fix on top of it,
or revert it for now... I'm not sure tic processing is working at all
right now.
I've been wondering this too. Yeah, the bios code isn't merged yet, but
this patch means there's no point in merging it until we get TIC fixed
properly. Which I'd hope is 5.1, but I'm not sure how deep this bowl of
spaghetti is.
We also need to make sure that this is not a bug in the bios instead...
Of course. I don't see it yet, but I haven't looked at the code very
closely. From what he wrote in $qemusrc/docs/devel, it seems okay and
the CCWs I see showing up in vfio-ccw seem to match with it. We just
have trouble distinguishing between a "forward" versus "backward" TIC.
I just replied to patch 15 of Jason's bios series, with a small fix. I
didn't pay particularly close attention at the time, but the original
error I sent here was an Incorrect Length, and both the SEEK and SIDE
had a count of 8 bytes. Fixing that converts the error to a program
check, and the cpa in the irb points to where the read would be. So
this gets things to a more accurate "do we count/copy the TIC when
handling our inputs" scenario as discussed above.
Your fix in the bios code looks correct.
By "program check" you mean "channel program check", right?
Yes
Do you know
which ccw triggers that?
Also would be interesting when it breaks. Is it when we want to execute
the READ, or does it break on a SEEK?
I think we failed trying to issue the READ. But because we didn't count
it in cp_init(), we didn't build it as part of the channel program
vfio-ccw issued. So it's actually pointing to nothing/garbage data.
Certainly the data in the irb suggests it's not unhappy with the SEEK/SIDE.