On Thu, 2022-06-23 at 17:34 -0400, Eric Farman wrote: > On Thu, 2022-06-23 at 16:32 -0400, Eric Farman wrote: > > On Thu, 2022-06-23 at 13:11 -0600, Keith Busch wrote: > > > On Thu, Jun 23, 2022 at 12:51:08PM -0600, Keith Busch wrote: > > > > On Thu, Jun 23, 2022 at 02:29:13PM -0400, Eric Farman wrote: > > > > > On Fri, 2022-06-10 at 12:58 -0700, Keith Busch wrote: > > > > > > From: Keith Busch <kbusch@xxxxxxxxxx> > > > > > > > > > > > > Use the address alignment requirements from the > > > > > > block_device > > > > > > for > > > > > > direct > > > > > > io instead of requiring addresses be aligned to the block > > > > > > size. > > > > > > > > > > Hi Keith, > > > > > > > > > > Our s390 PV guests recently started failing to boot from a > > > > > -next > > > > > host, > > > > > and git blame brought me here. > > > > > > > > > > As near as I have been able to tell, we start tripping up on > > > > > this > > > > > code > > > > > from patch 9 [1] that gets invoked with this patch: > > > > > > > > > > > for (k = 0; k < i->nr_segs; k++, skip = 0) { > > > > > > size_t len = i->iov[k].iov_len - skip; > > > > > > > > > > > > if (len > size) > > > > > > len = size; > > > > > > if (len & len_mask) > > > > > > return false; > > > > > > > > > > The iovec we're failing on has two segments, one with a len > > > > > of > > > > > x200 > > > > > (and base of x...000) and another with a len of xe00 (and a > > > > > base > > > > > of > > > > > x...200), while len_mask is of course xfff. > > > > > > > > > > So before I go any further on what we might have broken, do > > > > > you > > > > > happen > > > > > to have any suggestions what might be going on here, or > > > > > something > > > > > I > > > > > should try? > > > > > > > > Thanks for the notice, sorry for the trouble. This check wasn't > > > > intended to > > > > have any difference from the previous code with respect to the > > > > vector lengths. > > > > > > > > Could you tell me if you're accessing this through the block > > > > device > > > > direct-io, > > > > or through iomap filesystem? > > > > Reasonably certain the failure's on iomap. I'd reverted the subject > > patch from next-20220622 and got things in working order. > > > > > If using iomap, the previous check was this: > > > > > > unsigned int blkbits = > > > blksize_bits(bdev_logical_block_size(iomap->bdev)); > > > unsigned int align = iov_iter_alignment(dio->submit.iter); > > > ... > > > if ((pos | length | align) & ((1 << blkbits) - 1)) > > > return -EINVAL; > > > > > > > > ... > > > The result of "iov_iter_alignment()" would include "0xe00 | > > > 0x200" > > > in > > > your > > > example, and checked against 0xfff should have been failing prior > > > to > > > this > > > patch. Unless I'm missing something... > > > > Nope, you're not. I didn't look back at what the old check was > > doing, > > just saw "0xe00 and 0x200" and thought "oh there's one page" > > instead > > of > > noting the code was or'ing them. My bad. > > > > That was the last entry in my trace before the guest gave up, as > > everything else through this code up to that point seemed okay. > > I'll > > pick up the working case and see if I can get a clearer picture > > between > > the two. > > Looking over the trace again, I realize I did dump > iov_iter_alignment() > as a comparator, and I see one pass through that had a non-zero > response but bdev_iter_is_aligned() returned true... > > count = x1000 > iov_offset = x0 > nr_segs = 1 > iov_len = x1000 (len_mask = xfff) > iov_base = x...200 (addr_mask = x1ff) > > That particular pass through is in the middle of the stuff it tried > to > do, so I don't know if that's the cause or not but it strikes me as > unusual. Will look into that tomorrow and report back. > Apologies, it took me an extra day to get back to this, but it is indeed this pass through that's causing our boot failures. I note that the old code (in iomap_dio_bio_iter), did: if ((pos | length | align) & ((1 << blkbits) - 1)) return -EINVAL; With blkbits equal to 12, the resulting mask was 0x0fff against an align value (from iov_iter_alignment) of x200 kicks us out. The new code (in iov_iter_aligned_iovec), meanwhile, compares this: if ((unsigned long)(i->iov[k].iov_base + skip) & addr_mask) return false; iov_base (and the output of the old iov_iter_aligned_iovec() routine) is x200, but since addr_mask is x1ff this check provides a different response than it used to. To check this, I changed the comparator to len_mask (almost certainly not the right answer since addr_mask is then unused, but it was good for a quick test), and our PV guests are able to boot again with -next running in the host. Thanks, Eric