Re: [PULL 1/1] vfio-ccw: Don't assume there are more ccws after a TIC

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 02/20/2019 04:48 AM, Cornelia Huck wrote:
On Tue, 19 Feb 2019 21:49:07 -0500
Eric Farman <farman@xxxxxxxxxxxxx> wrote:

Hi Connie, Farhan,

On 02/04/2019 12:06 PM, Cornelia Huck wrote:
From: Farhan Ali <alifm@xxxxxxxxxxxxx>

When trying to calculate the length of a ccw chain, we assume
there are ccws after a TIC. This can lead to overcounting and
copying garbage data from guest memory.

Signed-off-by: Farhan Ali <alifm@xxxxxxxxxxxxx>
Message-Id: <d63748c1f1b03147bcbf401596638627a5e35ef7.1548082107.git.alifm@xxxxxxxxxxxxx>
Reviewed-by: Halil Pasic <pasic@xxxxxxxxxxxxx>
Signed-off-by: Cornelia Huck <cohuck@xxxxxxxxxx>
---
   drivers/s390/cio/vfio_ccw_cp.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
index 70a006ba4d05..ba08fe137c2e 100644
--- a/drivers/s390/cio/vfio_ccw_cp.c
+++ b/drivers/s390/cio/vfio_ccw_cp.c
@@ -392,7 +392,7 @@ static int ccwchain_calc_length(u64 iova, struct channel_program *cp)
   			return -EOPNOTSUPP;
   		}
- if ((!ccw_is_chain(ccw)) && (!ccw_is_tic(ccw)))
+		if (!ccw_is_chain(ccw))
   			break;
ccw++;

Sigh.  Let me state up front that I'm running vanilla 5.0-rc7 host
kernel with this patch applied (or not), and QEMU from a week or two ago.

I stated in another thread that this patch makes things better when
running fio and other random exercisers within my guest.  But I was
always booting off a virtio-blk disk when I did that, and using some
additional disks connected by vfio-ccw.  Today I discovered that this
patch prevents guest boot from vfio-ccw (which was working previously):

/usr/local/bin/qemu-system-s390x -machine s390-ccw-virtio,accel=kvm -smp
4 -m 1024 -nographic -s -net none -device
vfio-ccw,sysfsdev=/sys/class/mdev_bus/0.0.0cbb/9503d38c-b83a-45b1-a2f0-393739817786,devno=fe.0.0e1e,force-orb-pfch=true,bootindex=0
-bios /usr/src/qemu/build/pc-bios/s390-ccw/s390-ccw.img
LOADPARM=[        ]
vfio-ccw device I/O error - Interrupt Response Block Data:
      Function Ctrl : [Start]
      Activity Ctrl :
      Status Ctrl : [Alert] [Primary] [Secondary] [Status-Pending]
      Device Status : [Channel-End] [Device-End]
      Channel Status : [Incorrect-Length]
      cpa=: 0x0000000000000018
      prev_ccw=: 0x3100003840000008
      this_ccw=: 0x0800001000000000
Eckd Dasd Sense Data (fmt 32-bytes):
      Sense Condition Flags :
      Residual Count     =: 0x0000000000000000
      Phys Drive ID      =: 0x0000000000000000
      low cyl address    =: 0x0000000000000000
      head addr & hi cyl =: 0x0000000000000000
      format/message     =: 0x0000000000000000
      fmt-dependent[0-7] =: 0x0000000000000000
      fmt-dependent[8-15]=: 0x0000000005160f00
      prog action code   =: 0x0000000000000000
      Configuration info =: 0x00000000000040e0
      mcode / hi-cyl     =: 0x0000000000000000
      cyl & head addr [0]=: 0x0000000000000000
      cyl & head addr [1]=: 0x0000000000000000
      cyl & head addr [2]=: 0x0000000000000000
...snip...

Reverting this patch allows the boot to proceed normally.  So, ugh.

Agreed on the 'ugh' :(


I'm not going to claim to know exactly how this is failing, because it's
too late for coffee and reading IPL records is difficult enough.  But in
the working case (without this patch), cp_init() calls
ccwchain_calc_length() and gets a count of four CCWs (SEEK + SIDE + TIC
+ READ), which is broken up into two chains.  One for the SEEK/SIDE/TIC,
and another for the READ.  In the failing case (with this patch),
cp_init() only counts three CCWs, we never read the IPL2 address (the
second chain in the working case), and our boot fails.

Unsure what the next best move is here.  It is possible that the
(ill-used) check that Farhan noticed and removed was actually added to
"make boot work" when handling recursive TIC CCWs such as this.  But I
still agree that this patch is correct, even though it exposes more
problems with our TIC handling throughout this code at large.  Any thoughts?

Yes, there's probably something rotten in our TIC handling...

Just to make things clear: This patch makes doing I/O on a booted
system more stable, but makes the not-yet-merged QEMU bios boot code
fail, right?

Correct.

(I think I deleted that while editing.  Sorry for the confusion.)


I'll guess I'll stare at the ccw translation code for a bit and see if
something jumps out at me...


I'll ping Jason a bit later today, and see if anything jumps out at us before Farhan returns.




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux