On Mon, Mar 11 2024 at 9:09P -0400, Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote: > On Mon, Mar 11, 2024 at 08:28:50PM -0400, Mike Snitzer wrote: > > All for Jens being made to suffer with dm-crypt but I think we need a > > proper root cause of what is happening for you and Johannes ;) > > I'm going to try to stay out of the cranking, but I think the reason is > that the limits stacking inherits the max_segment_size, nvme has weird > rules for them due their odd PRPs, and dm-crypt set it's own > max_segment_size to split out each page. The regression here is > that we now actually verify that conflict. > > So this happens only for dm-crypt on nvme. The fix is probably > to not inherit low-level limits like max_segment_size, but I need > to think about it a bit more and come up with an automated test case > using say nvme-loop. Yeah, I generally agree. I looked at the latest code to more fully understand why this failed. 1) dm-crypt.c:crypt_io_hints() sets limits->max_segment_size = PAGE_SIZE; 2) drivers/nvme/host/core.c:nvme_set_ctrl_limits() sets: lim->virt_boundary_mask = NVME_CTRL_PAGE_SIZE - 1; lim->max_segment_size = UINT_MAX; 3) blk_stack_limits(t=dm-crypt, b=nvme-pci) will combine limits: t->virt_boundary_mask = min_not_zero(t->virt_boundary_mask, b->virt_boundary_mask); t->max_segment_size = min_not_zero(t->max_segment_size, b->max_segment_size); 4) blk_validate_limits() will reject the limits that blk_stack_limits() created: /* * Devices that require a virtual boundary do not support scatter/gather * I/O natively, but instead require a descriptor list entry for each * page (which might not be identical to the Linux PAGE_SIZE). Because * of that they are not limited by our notion of "segment size". */ if (lim->virt_boundary_mask) { if (WARN_ON_ONCE(lim->max_segment_size && lim->max_segment_size != UINT_MAX)) return -EINVAL; lim->max_segment_size = UINT_MAX; } else { /* * The maximum segment size has an odd historic 64k default that * drivers probably should override. Just like the I/O size we * require drivers to at least handle a full page per segment. */ if (!lim->max_segment_size) lim->max_segment_size = BLK_MAX_SEGMENT_SIZE; if (WARN_ON_ONCE(lim->max_segment_size < PAGE_SIZE)) return -EINVAL; } blk_validate_limits() is currently very pedantic. I discussed with Jens briefly and we're thinking it might make sense for blk_validate_limits() to be more forgiving by _not_ imposing hard -EINVAL failure. That in the interim, during this transition to more curated and atomic limits, a WARN_ON_ONCE() splat should serve as enough notice to developers (be it lower level nvme or higher-level virtual devices like DM). BUT for this specific max_segment_size case, the constraints of dm-crypt are actually more conservative due to crypto requirements. Yet nvme's more general "don't care, but will care if non-nvme driver does" for this particular max_segment_size limit is being imposed when validating the combined limits that dm-crypt will impose at the top-level. All said, the above "if (lim->virt_boundary_mask)" check in blk_validate_limits() looks bogus for stacked device limits. Mike