On 02/20/2019 05:35 PM, Eric Farman wrote:
On 02/20/2019 11:44 AM, Cornelia Huck wrote:
On Wed, 20 Feb 2019 11:25:46 -0500
Eric Farman <farman@xxxxxxxxxxxxx> wrote:
On 02/20/2019 07:44 AM, Cornelia Huck wrote:
On Wed, 20 Feb 2019 06:29:38 -0500
Eric Farman <farman@xxxxxxxxxxxxx> wrote:
On 02/20/2019 04:48 AM, Cornelia Huck wrote:
On Tue, 19 Feb 2019 21:49:07 -0500
Eric Farman <farman@xxxxxxxxxxxxx> wrote:
Hi Connie, Farhan,
On 02/04/2019 12:06 PM, Cornelia Huck wrote:
From: Farhan Ali <alifm@xxxxxxxxxxxxx>
When trying to calculate the length of a ccw chain, we assume
there are ccws after a TIC. This can lead to overcounting and
copying garbage data from guest memory.
Signed-off-by: Farhan Ali <alifm@xxxxxxxxxxxxx>
Message-Id:
<d63748c1f1b03147bcbf401596638627a5e35ef7.1548082107.git.alifm@xxxxxxxxxxxxx>
Reviewed-by: Halil Pasic <pasic@xxxxxxxxxxxxx>
Signed-off-by: Cornelia Huck <cohuck@xxxxxxxxxx>
---
drivers/s390/cio/vfio_ccw_cp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/s390/cio/vfio_ccw_cp.c
b/drivers/s390/cio/vfio_ccw_cp.c
index 70a006ba4d05..ba08fe137c2e 100644
--- a/drivers/s390/cio/vfio_ccw_cp.c
+++ b/drivers/s390/cio/vfio_ccw_cp.c
@@ -392,7 +392,7 @@ static int ccwchain_calc_length(u64 iova,
struct channel_program *cp)
return -EOPNOTSUPP;
}
- if ((!ccw_is_chain(ccw)) && (!ccw_is_tic(ccw)))
+ if (!ccw_is_chain(ccw))
OK, this function now returns the length of the chain excluding the
last tic.
Hm, not so sure about that anymore. I'm a bit tired, please apply salt
where needed to what I'm seeing.
I didn't get as far as I had hoped today, so I'm going to need to reset
to tomorrow, have coffee, and try again. But it does seem that with
this patch, we calculate the length of the chain up to and including the
TIC, and nothing beyond it.
That is, during the boot process we calculate a chain length of "3" for
a SEEK + SIDE + TIC that QEMU built.
break;
ccw++;
Now, cp_init will not copy the last tic to the chain. When it then
looks for tics in that new chain, it won't find any, and stop copying.
(...)
I'm not going to claim to know exactly how this is failing,
because it's too late for coffee and reading IPL records is
difficult enough. But in the working case (without this patch),
cp_init() calls ccwchain_calc_length() and gets a count of four
CCWs (SEEK + SIDE + TIC
+ READ), which is broken up into two chains. One for the
SEEK/SIDE/TIC, and another for the READ. In the failing case
(with this patch), cp_init() only counts three CCWs, we never read
the IPL2 address (the second chain in the working case), and our
boot fails.
I think "we don't copy the tic to the chain where we search for tics"
fits those symptoms.
Indeed. Hrm...
(...)
See above. We may need two chains: One without the trailing tic, and
one to process to see where that tic points to... or rework the way to
follow the tic? Not sure.
I hate the idea of two chains, because how much storage do we end up
needing to consume? But maybe it's inevitable, or maybe I can think up
a way to rework it without.
Yeah, that sucks. If we really need to track tics separately (and I'm
not so sure, see above), there should be a better way for that...
(...)
I'm wondering whether we should keep this patch and fix on top of it,
or revert it for now... I'm not sure tic processing is working at all
right now.
I've been wondering this too. Yeah, the bios code isn't merged yet, but
this patch means there's no point in merging it until we get TIC fixed
properly. Which I'd hope is 5.1, but I'm not sure how deep this bowl of
spaghetti is.
We also need to make sure that this is not a bug in the bios instead...
Of course. I don't see it yet, but I haven't looked at the code very
closely. From what he wrote in $qemusrc/docs/devel, it seems okay and
the CCWs I see showing up in vfio-ccw seem to match with it. We just
have trouble distinguishing between a "forward" versus "backward" TIC.
I just replied to patch 15 of Jason's bios series, with a small fix. I
didn't pay particularly close attention at the time, but the original
error I sent here was an Incorrect Length, and both the SEEK and SIDE
had a count of 8 bytes. Fixing that converts the error to a program
check, and the cpa in the irb points to where the read would be. So
this gets things to a more accurate "do we count/copy the TIC when
handling our inputs" scenario as discussed above.
(Why didn't it complain when SEEK set the SLI bit, but SIDE didn't? Odd.)
But at this point, both my laptop and I are out of energy, and I should
plug both in. Will pick things up tomorrow.
Maybe we need a tool for testing that throws random channel programs at
the device :)
Are you peeking at my todo/wish list? :)
:)
Might be something for kvm unit tests? Or do I misunderstand what they
can do?
I haven't considered that too deeply, but I think that could be
possible. Maybe not quickly, but possible.