On Mon, 25 Feb 2019 11:30:47 -0500 Eric Farman <farman@xxxxxxxxxxxxx> wrote: > On 02/25/2019 04:11 AM, Cornelia Huck wrote: > > On Fri, 22 Feb 2019 19:39:39 +0100 > > Eric Farman <farman@xxxxxxxxxxxxx> wrote: > > > >> Per the discussion [1] about a problem with how vfio-ccw calculates > >> the length of a channel program (specifically when using the > >> forthcoming QEMU BIOS code for DASD IPL), I present this fix. > >> > >> Patch 1 fixes the problem, and is over-engineered > >> for readability sake. > > > > :) > > > >> > >> Patch 2 takes the functions from Patch 1, and refactors the > >> existing code to make other areas a little easier to understand. > >> (I hope.) > >> > >> I've been running fio for over 24 hours now, and have seen > >> zero hours. Previously, I would have probably seen "a few" > >> errors by now, where prior to the original fix I would've seen > >> "many" errors. Further tests are still ongoing. > > > > Awesome, thanks! > > I left fio running over the weekend, with newly-randomized parameters > every hour or two... Had one error yesterday morning, in the > NOP+TIC-to-redrive-I/O case. I didn't leave any tracing on because I > didn't expect I'd be able to get anything before they wrapped, and > didn't have time to figure out a way to cleanly filter errors. > > Though I did leave a counter in place for the number of times we > processed a TIC that goes back into the current chain, and it hit about > 1900 times since Friday. More than three quarters of them occurred > during the error yesterday morning, so something was being dramatic at > the time. I guess there's one obscure corner to track down, but it > otherwise seems to run quite a bit better than before. Agreed, the patches as they are now are a real improvement.