On Thu, 21 Feb 2019 10:30:01 +0100 Cornelia Huck <cohuck@xxxxxxxxxx> wrote: > On Wed, 20 Feb 2019 22:02:10 -0500 > Eric Farman <farman@xxxxxxxxxxxxx> wrote: > > > On 02/20/2019 05:35 PM, Eric Farman wrote: > > > > > > > > > On 02/20/2019 11:44 AM, Cornelia Huck wrote: > > >> On Wed, 20 Feb 2019 11:25:46 -0500 > > >> Eric Farman <farman@xxxxxxxxxxxxx> wrote: > > >> > > >>> On 02/20/2019 07:44 AM, Cornelia Huck wrote: > > >>>> On Wed, 20 Feb 2019 06:29:38 -0500 > > >>>> Eric Farman <farman@xxxxxxxxxxxxx> wrote: > > >>>>> On 02/20/2019 04:48 AM, Cornelia Huck wrote: > > >>>>>> On Tue, 19 Feb 2019 21:49:07 -0500 > > >>>>>> Eric Farman <farman@xxxxxxxxxxxxx> wrote: > > >>>>>>> Hi Connie, Farhan, > > >>>>>>> > > >>>>>>> On 02/04/2019 12:06 PM, Cornelia Huck wrote: > > >>>>>>>> From: Farhan Ali <alifm@xxxxxxxxxxxxx> > > >>>>>>>> > > >>>>>>>> When trying to calculate the length of a ccw chain, we assume > > >>>>>>>> there are ccws after a TIC. This can lead to overcounting and > > >>>>>>>> copying garbage data from guest memory. > > >>>>>>>> > > >>>>>>>> Signed-off-by: Farhan Ali <alifm@xxxxxxxxxxxxx> > > >>>>>>>> Message-Id: > > >>>>>>>> <d63748c1f1b03147bcbf401596638627a5e35ef7.1548082107.git.alifm@xxxxxxxxxxxxx> > > >>>>>>>> > > >>>>>>>> Reviewed-by: Halil Pasic <pasic@xxxxxxxxxxxxx> > > >>>>>>>> Signed-off-by: Cornelia Huck <cohuck@xxxxxxxxxx> > > >>>>>>>> --- > > >>>>>>>> drivers/s390/cio/vfio_ccw_cp.c | 2 +- > > >>>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-) > > >>>>>>>> > > >>>>>>>> diff --git a/drivers/s390/cio/vfio_ccw_cp.c > > >>>>>>>> b/drivers/s390/cio/vfio_ccw_cp.c > > >>>>>>>> index 70a006ba4d05..ba08fe137c2e 100644 > > >>>>>>>> --- a/drivers/s390/cio/vfio_ccw_cp.c > > >>>>>>>> +++ b/drivers/s390/cio/vfio_ccw_cp.c > > >>>>>>>> @@ -392,7 +392,7 @@ static int ccwchain_calc_length(u64 iova, > > >>>>>>>> struct channel_program *cp) > > >>>>>>>> return -EOPNOTSUPP; > > >>>>>>>> } > > >>>>>>>> - if ((!ccw_is_chain(ccw)) && (!ccw_is_tic(ccw))) > > >>>>>>>> + if (!ccw_is_chain(ccw)) > > >>>> > > >>>> OK, this function now returns the length of the chain excluding the > > >>>> last tic. > > >> > > >> Hm, not so sure about that anymore. I'm a bit tired, please apply salt > > >> where needed to what I'm seeing. > > > > > > I didn't get as far as I had hoped today, so I'm going to need to reset > > > to tomorrow, have coffee, and try again. But it does seem that with > > > this patch, we calculate the length of the chain up to and including the > > > TIC, and nothing beyond it. > > > > > > That is, during the boot process we calculate a chain length of "3" for > > > a SEEK + SIDE + TIC that QEMU built. > > Hm... so it looks like that code does what it says on the tin. But why > are we missing the second round, that should give us a chain with the > forth ccw? Are we missing a loop somewhere? One thing where this is different now that we don't count the last READ is that tic_target_chain_exists() won't return 1 if we tic to the READ, as it has not been copied. So we execute code that presumably had not been executed before... > > > > > > >> > > >>>>>>>> break; > > >>>>>>>> ccw++; > > >>>> > > >>>> Now, cp_init will not copy the last tic to the chain. When it then > > >>>> looks for tics in that new chain, it won't find any, and stop copying. > > >> (...) > > >>>>>>> I'm not going to claim to know exactly how this is failing, > > >>>>>>> because it's too late for coffee and reading IPL records is > > >>>>>>> difficult enough. But in the working case (without this patch), > > >>>>>>> cp_init() calls ccwchain_calc_length() and gets a count of four > > >>>>>>> CCWs (SEEK + SIDE + TIC > > >>>>>>> + READ), which is broken up into two chains. One for the > > >>>>>>> SEEK/SIDE/TIC, and another for the READ. In the failing case > > >>>>>>> (with this patch), cp_init() only counts three CCWs, we never read > > >>>>>>> the IPL2 address (the second chain in the working case), and our > > >>>>>>> boot fails. > > >>>> > > >>>> I think "we don't copy the tic to the chain where we search for tics" > > >>>> fits those symptoms. > > >>> > > >>> Indeed. Hrm... > > >> (...) > > >>>> See above. We may need two chains: One without the trailing tic, and > > >>>> one to process to see where that tic points to... or rework the way to > > >>>> follow the tic? Not sure. > > >>> > > >>> I hate the idea of two chains, because how much storage do we end up > > >>> needing to consume? But maybe it's inevitable, or maybe I can think up > > >>> a way to rework it without. > > >> > > >> Yeah, that sucks. If we really need to track tics separately (and I'm > > >> not so sure, see above), there should be a better way for that... > > >> > > >> (...) > > >> > > >>>> I'm wondering whether we should keep this patch and fix on top of it, > > >>>> or revert it for now... I'm not sure tic processing is working at all > > >>>> right now. > > >>> > > >>> I've been wondering this too. Yeah, the bios code isn't merged yet, but > > >>> this patch means there's no point in merging it until we get TIC fixed > > >>> properly. Which I'd hope is 5.1, but I'm not sure how deep this bowl of > > >>> spaghetti is. > > >> > > >> We also need to make sure that this is not a bug in the bios instead... > > > > > > Of course. I don't see it yet, but I haven't looked at the code very > > > closely. From what he wrote in $qemusrc/docs/devel, it seems okay and > > > the CCWs I see showing up in vfio-ccw seem to match with it. We just > > > have trouble distinguishing between a "forward" versus "backward" TIC. > > > > I just replied to patch 15 of Jason's bios series, with a small fix. I > > didn't pay particularly close attention at the time, but the original > > error I sent here was an Incorrect Length, and both the SEEK and SIDE > > had a count of 8 bytes. Fixing that converts the error to a program > > check, and the cpa in the irb points to where the read would be. So > > this gets things to a more accurate "do we count/copy the TIC when > > handling our inputs" scenario as discussed above. > > Your fix in the bios code looks correct. > > By "program check" you mean "channel program check", right? Do you know > which ccw triggers that? Also would be interesting when it breaks. Is it when we want to execute the READ, or does it break on a SEEK?