Connie,
Sorry if this does not thread nicely. I never received the original (Thunderbird
corruption/hangs) so I had to fake it :).
>> My point is, when the subchannel is disabled, 'firmware' is responsible
>> for suppressing interrupts and error conditions, and also for
>> doing the appropriate recovery procedure, so to say under the hood.
>
> I don't think there's actually much of a 'recovery' possible at a
> subchannel level (other than 'have you tried turning it off and on
> again?'); the interesting stuff is all at the device-specific level.
>
>> I think Jason has discovered some problems related to this while doing
>> his DASD IPL with vfio-ccw work, but I don't quite remember any more.
>
> cc:ing Jason, in case he remembers :)
Here is what Halil was talking about.
I'm seeing a problem during kvm on z development of vfio-ccw (passthrough dasd).
After a fresh IPL of the host system, sometimes my first channel program
executed on my vfio-ccw device generates a unit check. The sense data given for
that unit check indicates that a reset event has occurred. This is apparently
normal to see after a device, channel or subsystem reset.
I'm trying to figure out how to deal with this unit check in the vfio-ccw kernel
driver. The thinking at the moment is to just retry any i/o operation a limited
number of times after any unit check, then give up if the i/o operation still
does not succeed. The kernel apparently uses a similar approach for the regular
dasd driver.
--
-- Jason J. Herne (jjherne@xxxxxxxxxxxxx)