Re: [PULL 1/1] vfio-ccw: Don't assume there are more ccws after a TIC

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 21 Feb 2019 10:30:01 +0100
Cornelia Huck <cohuck@xxxxxxxxxx> wrote:

> On Wed, 20 Feb 2019 22:02:10 -0500
> Eric Farman <farman@xxxxxxxxxxxxx> wrote:
> 
> > On 02/20/2019 05:35 PM, Eric Farman wrote:  
> > > 
> > > 
> > > On 02/20/2019 11:44 AM, Cornelia Huck wrote:    
> > >> On Wed, 20 Feb 2019 11:25:46 -0500
> > >> Eric Farman <farman@xxxxxxxxxxxxx> wrote:
> > >>    
> > >>> On 02/20/2019 07:44 AM, Cornelia Huck wrote:    
> > >>>> On Wed, 20 Feb 2019 06:29:38 -0500
> > >>>> Eric Farman <farman@xxxxxxxxxxxxx> wrote:    
> > >>>>> On 02/20/2019 04:48 AM, Cornelia Huck wrote:    
> > >>>>>> On Tue, 19 Feb 2019 21:49:07 -0500
> > >>>>>> Eric Farman <farman@xxxxxxxxxxxxx> wrote:    
> > >>>>>>> Hi Connie, Farhan,
> > >>>>>>>
> > >>>>>>> On 02/04/2019 12:06 PM, Cornelia Huck wrote:    
> > >>>>>>>> From: Farhan Ali <alifm@xxxxxxxxxxxxx>
> > >>>>>>>>
> > >>>>>>>> When trying to calculate the length of a ccw chain, we assume
> > >>>>>>>> there are ccws after a TIC. This can lead to overcounting and
> > >>>>>>>> copying garbage data from guest memory.
> > >>>>>>>>
> > >>>>>>>> Signed-off-by: Farhan Ali <alifm@xxxxxxxxxxxxx>
> > >>>>>>>> Message-Id: 
> > >>>>>>>> <d63748c1f1b03147bcbf401596638627a5e35ef7.1548082107.git.alifm@xxxxxxxxxxxxx> 
> > >>>>>>>>
> > >>>>>>>> Reviewed-by: Halil Pasic <pasic@xxxxxxxxxxxxx>
> > >>>>>>>> Signed-off-by: Cornelia Huck <cohuck@xxxxxxxxxx>
> > >>>>>>>> ---
> > >>>>>>>>      drivers/s390/cio/vfio_ccw_cp.c | 2 +-
> > >>>>>>>>      1 file changed, 1 insertion(+), 1 deletion(-)
> > >>>>>>>>
> > >>>>>>>> diff --git a/drivers/s390/cio/vfio_ccw_cp.c 
> > >>>>>>>> b/drivers/s390/cio/vfio_ccw_cp.c
> > >>>>>>>> index 70a006ba4d05..ba08fe137c2e 100644
> > >>>>>>>> --- a/drivers/s390/cio/vfio_ccw_cp.c
> > >>>>>>>> +++ b/drivers/s390/cio/vfio_ccw_cp.c
> > >>>>>>>> @@ -392,7 +392,7 @@ static int ccwchain_calc_length(u64 iova, 
> > >>>>>>>> struct channel_program *cp)
> > >>>>>>>>                  return -EOPNOTSUPP;
> > >>>>>>>>              }
> > >>>>>>>> -        if ((!ccw_is_chain(ccw)) && (!ccw_is_tic(ccw)))
> > >>>>>>>> +        if (!ccw_is_chain(ccw))    
> > >>>>
> > >>>> OK, this function now returns the length of the chain excluding the
> > >>>> last tic.    
> > >>
> > >> Hm, not so sure about that anymore. I'm a bit tired, please apply salt
> > >> where needed to what I'm seeing.    
> > > 
> > > I didn't get as far as I had hoped today, so I'm going to need to reset 
> > > to tomorrow, have coffee, and try again.  But it does seem that with 
> > > this patch, we calculate the length of the chain up to and including the 
> > > TIC, and nothing beyond it.
> > > 
> > > That is, during the boot process we calculate a chain length of "3" for 
> > > a SEEK + SIDE + TIC that QEMU built.  
> 
> Hm... so it looks like that code does what it says on the tin. But why
> are we missing the second round, that should give us a chain with the
> forth ccw? Are we missing a loop somewhere?

One thing where this is different now that we don't count the last READ
is that tic_target_chain_exists() won't return 1 if we tic to the READ,
as it has not been copied. So we execute code that presumably had not
been executed before...

> 
> > >     
> > >>    
> > >>>>>>>>                  break;
> > >>>>>>>>              ccw++;    
> > >>>>
> > >>>> Now, cp_init will not copy the last tic to the chain. When it then
> > >>>> looks for tics in that new chain, it won't find any, and stop copying.    
> > >> (...)    
> > >>>>>>> I'm not going to claim to know exactly how this is failing,
> > >>>>>>> because it's too late for coffee and reading IPL records is
> > >>>>>>> difficult enough.  But in the working case (without this patch),
> > >>>>>>> cp_init() calls ccwchain_calc_length() and gets a count of four
> > >>>>>>> CCWs (SEEK + SIDE + TIC
> > >>>>>>> + READ), which is broken up into two chains.  One for the
> > >>>>>>> SEEK/SIDE/TIC, and another for the READ.  In the failing case
> > >>>>>>> (with this patch), cp_init() only counts three CCWs, we never read
> > >>>>>>> the IPL2 address (the second chain in the working case), and our
> > >>>>>>> boot fails.    
> > >>>>
> > >>>> I think "we don't copy the tic to the chain where we search for tics"
> > >>>> fits those symptoms.    
> > >>>
> > >>> Indeed.  Hrm...    
> > >> (...)    
> > >>>> See above. We may need two chains: One without the trailing tic, and
> > >>>> one to process to see where that tic points to... or rework the way to
> > >>>> follow the tic? Not sure.    
> > >>>
> > >>> I hate the idea of two chains, because how much storage do we end up
> > >>> needing to consume?  But maybe it's inevitable, or maybe I can think up
> > >>> a way to rework it without.    
> > >>
> > >> Yeah, that sucks. If we really need to track tics separately (and I'm
> > >> not so sure, see above), there should be a better way for that...
> > >>
> > >> (...)
> > >>    
> > >>>> I'm wondering whether we should keep this patch and fix on top of it,
> > >>>> or revert it for now... I'm not sure tic processing is working at all
> > >>>> right now.    
> > >>>
> > >>> I've been wondering this too.  Yeah, the bios code isn't merged yet, but
> > >>> this patch means there's no point in merging it until we get TIC fixed
> > >>> properly.  Which I'd hope is 5.1, but I'm not sure how deep this bowl of
> > >>> spaghetti is.    
> > >>
> > >> We also need to make sure that this is not a bug in the bios instead...    
> > > 
> > > Of course.  I don't see it yet, but I haven't looked at the code very 
> > > closely.  From what he wrote in $qemusrc/docs/devel, it seems okay and 
> > > the CCWs I see showing up in vfio-ccw seem to match with it.  We just 
> > > have trouble distinguishing between a "forward" versus "backward" TIC.    
> > 
> > I just replied to patch 15 of Jason's bios series, with a small fix.  I 
> > didn't pay particularly close attention at the time, but the original 
> > error I sent here was an Incorrect Length, and both the SEEK and SIDE 
> > had a count of 8 bytes.  Fixing that converts the error to a program 
> > check, and the cpa in the irb points to where the read would be.  So 
> > this gets things to a more accurate "do we count/copy the TIC when 
> > handling our inputs" scenario as discussed above.  
> 
> Your fix in the bios code looks correct.
> 
> By "program check" you mean "channel program check", right? Do you know
> which ccw triggers that?

Also would be interesting when it breaks. Is it when we want to execute
the READ, or does it break on a SEEK?



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux