Re: [PULL 1/1] vfio-ccw: Don't assume there are more ccws after a TIC

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 20 Feb 2019 06:29:38 -0500
Eric Farman <farman@xxxxxxxxxxxxx> wrote:

> On 02/20/2019 04:48 AM, Cornelia Huck wrote:
> > On Tue, 19 Feb 2019 21:49:07 -0500
> > Eric Farman <farman@xxxxxxxxxxxxx> wrote:
> >   
> >> Hi Connie, Farhan,
> >>
> >> On 02/04/2019 12:06 PM, Cornelia Huck wrote:  
> >>> From: Farhan Ali <alifm@xxxxxxxxxxxxx>
> >>>
> >>> When trying to calculate the length of a ccw chain, we assume
> >>> there are ccws after a TIC. This can lead to overcounting and
> >>> copying garbage data from guest memory.
> >>>
> >>> Signed-off-by: Farhan Ali <alifm@xxxxxxxxxxxxx>
> >>> Message-Id: <d63748c1f1b03147bcbf401596638627a5e35ef7.1548082107.git.alifm@xxxxxxxxxxxxx>
> >>> Reviewed-by: Halil Pasic <pasic@xxxxxxxxxxxxx>
> >>> Signed-off-by: Cornelia Huck <cohuck@xxxxxxxxxx>
> >>> ---
> >>>    drivers/s390/cio/vfio_ccw_cp.c | 2 +-
> >>>    1 file changed, 1 insertion(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
> >>> index 70a006ba4d05..ba08fe137c2e 100644
> >>> --- a/drivers/s390/cio/vfio_ccw_cp.c
> >>> +++ b/drivers/s390/cio/vfio_ccw_cp.c
> >>> @@ -392,7 +392,7 @@ static int ccwchain_calc_length(u64 iova, struct channel_program *cp)
> >>>    			return -EOPNOTSUPP;
> >>>    		}
> >>>    
> >>> -		if ((!ccw_is_chain(ccw)) && (!ccw_is_tic(ccw)))
> >>> +		if (!ccw_is_chain(ccw))

OK, this function now returns the length of the chain excluding the
last tic.

> >>>    			break;
> >>>    
> >>>    		ccw++;
> >>>    

Now, cp_init will not copy the last tic to the chain. When it then
looks for tics in that new chain, it won't find any, and stop copying. 
> >>
> >> Sigh.  Let me state up front that I'm running vanilla 5.0-rc7 host
> >> kernel with this patch applied (or not), and QEMU from a week or
> >> two ago.
> >>
> >> I stated in another thread that this patch makes things better when
> >> running fio and other random exercisers within my guest.  But I was
> >> always booting off a virtio-blk disk when I did that, and using
> >> some additional disks connected by vfio-ccw.  Today I discovered
> >> that this patch prevents guest boot from vfio-ccw (which was
> >> working previously):
> >>
> >> /usr/local/bin/qemu-system-s390x -machine
> >> s390-ccw-virtio,accel=kvm -smp 4 -m 1024 -nographic -s -net none
> >> -device
> >> vfio-ccw,sysfsdev=/sys/class/mdev_bus/0.0.0cbb/9503d38c-b83a-45b1-a2f0-393739817786,devno=fe.0.0e1e,force-orb-pfch=true,bootindex=0
> >> -bios /usr/src/qemu/build/pc-bios/s390-ccw/s390-ccw.img
> >> LOADPARM=[        ] vfio-ccw device I/O error - Interrupt Response
> >> Block Data: Function Ctrl : [Start]
> >>       Activity Ctrl :
> >>       Status Ctrl : [Alert] [Primary] [Secondary] [Status-Pending]
> >>       Device Status : [Channel-End] [Device-End]
> >>       Channel Status : [Incorrect-Length]
> >>       cpa=: 0x0000000000000018
> >>       prev_ccw=: 0x3100003840000008
> >>       this_ccw=: 0x0800001000000000
> >> Eckd Dasd Sense Data (fmt 32-bytes):
> >>       Sense Condition Flags :
> >>       Residual Count     =: 0x0000000000000000
> >>       Phys Drive ID      =: 0x0000000000000000
> >>       low cyl address    =: 0x0000000000000000
> >>       head addr & hi cyl =: 0x0000000000000000
> >>       format/message     =: 0x0000000000000000
> >>       fmt-dependent[0-7] =: 0x0000000000000000
> >>       fmt-dependent[8-15]=: 0x0000000005160f00
> >>       prog action code   =: 0x0000000000000000
> >>       Configuration info =: 0x00000000000040e0
> >>       mcode / hi-cyl     =: 0x0000000000000000
> >>       cyl & head addr [0]=: 0x0000000000000000
> >>       cyl & head addr [1]=: 0x0000000000000000
> >>       cyl & head addr [2]=: 0x0000000000000000
> >> ...snip...
> >>
> >> Reverting this patch allows the boot to proceed normally.  So,
> >> ugh.  
> > 
> > Agreed on the 'ugh' :(
> >   
> >>
> >> I'm not going to claim to know exactly how this is failing,
> >> because it's too late for coffee and reading IPL records is
> >> difficult enough.  But in the working case (without this patch),
> >> cp_init() calls ccwchain_calc_length() and gets a count of four
> >> CCWs (SEEK + SIDE + TIC
> >> + READ), which is broken up into two chains.  One for the
> >> SEEK/SIDE/TIC, and another for the READ.  In the failing case
> >> (with this patch), cp_init() only counts three CCWs, we never read
> >> the IPL2 address (the second chain in the working case), and our
> >> boot fails.

I think "we don't copy the tic to the chain where we search for tics"
fits those symptoms.

> >>
> >> Unsure what the next best move is here.  It is possible that the
> >> (ill-used) check that Farhan noticed and removed was actually
> >> added to "make boot work" when handling recursive TIC CCWs such as
> >> this.  But I still agree that this patch is correct, even though
> >> it exposes more problems with our TIC handling throughout this
> >> code at large.  Any thoughts?  
> > 
> > Yes, there's probably something rotten in our TIC handling...
> > 
> > Just to make things clear: This patch makes doing I/O on a booted
> > system more stable, but makes the not-yet-merged QEMU bios boot code
> > fail, right?  
> 
> Correct.
> 
> (I think I deleted that while editing.  Sorry for the confusion.)

np, I thought as much :)

> 
> > 
> > I'll guess I'll stare at the ccw translation code for a bit and see
> > if something jumps out at me...

See above. We may need two chains: One without the trailing tic, and
one to process to see where that tic points to... or rework the way to
follow the tic? Not sure.

> >   
> 
> I'll ping Jason a bit later today, and see if anything jumps out at
> us before Farhan returns.

I'm wondering whether we should keep this patch and fix on top of it,
or revert it for now... I'm not sure tic processing is working at all
right now.

Maybe we need a tool for testing that throws random channel programs at
the device :)



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux