>>> On 10.06.13 at 18:43, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> wrote: > On Mon, Jun 10, 2013 at 04:52:35PM +0100, Jan Beulich wrote: >> >>> On 07.06.13 at 22:11, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> wrote: >> > On Tue, Jun 04, 2013 at 03:57:06PM -0400, Konrad Rzeszutek Wilk wrote: >> >> + /* N.B. 'rp', not 'rc'. */ >> >> + if (RING_REQUEST_CONS_OVERFLOW(&blk_rings->common, rp)) { >> >> + pr_warn(DRV_PFX "Frontend provided bogus ring requests (%d - %d = %d). >> > Halting ring processing on dev=%04x\n", >> >> + rp, rc, rp - rc, blkif->vbd.pdevice); >> > >> > Hm, I seem to be able to get: >> > >> > [ 189.398095] xen-blkback:Frontend provided bogus ring requests (125 - 115 = > >> > 10). Halting ring processing on dev=800011 >> > or: >> > [ 478.558699] xen-blkback:Frontend provided bogus ring requests (95 - 94 = >> > 1). Halting ring processing on dev=800011 >> > >> > Which is clearly wrong. Piggybacking on the rsp_prod_pvt does not seem to >> > cut it. >> >> We see that too, but not very frequently. One thing is that >> rsp_prod_pvt doesn't get printed along with rc and rp, thus >> making it not immediately obvious how this can be off in any way. >> >> Among the instance there are cases where the printed >> difference is 32, which makes me wonder whether part of the >> problem is the >= in the macro (we may want > here). >> >> And then we might have been living with some sort of issue in the >> past, because the existing use of the macro just causes the loop >> to be exited, with it getting re-entered subsequently (i.e. at worst >> causing performance issues). > > My observation was that the rsp_prod_pvt was lagging behind b/c the > READ requests weren't completed. In other words, the processing > of the ring was stalled b/c 'make_response' hadn't been called yet. > Which meant that rsp_prod was not updated to rsp_prod_pvt (backend > does not care about that value, only frontend does). I don't buy this: rsp_prod is being updated by the backend just for the frontend's sake, so this value really doesn't need looking at (or else we'd become susceptible to the guest maliciously writing that field). rsp_prod_pvt, otoh, is never behind rsp_prod, and if the guest produces requests that don't have matching space for responses, the guest is doing something bogus (and perhaps malicious). > Going back to the rc an rp check solves the immediate 'insane ring > check'. Consequently, while this check is better than none at all, I think it is still too lax, and we really want to check against the produced responses. Just that other than for the rc check using >=, we'd need > for the rp one. But first of all let me see if I can get the original broken check to trigger wrongly here (so far only our stage testing caught these), and look at by how much rsp_prod_pvt really lags. Jan -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html