Re: [PATCH 02/14] RDMA/umem: Prevent small pages from being returned by ib_umem_find_best_pgsz()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Sep 02, 2020 at 03:05:40PM +0300, Leon Romanovsky wrote:
> On Wed, Sep 02, 2020 at 08:59:12AM -0300, Jason Gunthorpe wrote:
> > On Wed, Sep 02, 2020 at 02:51:19PM +0300, Leon Romanovsky wrote:
> > > On Tue, Sep 01, 2020 at 09:43:30PM -0300, Jason Gunthorpe wrote:
> > > > rdma_for_each_block() makes assumptions about how the SGL is constructed
> > > > that don't work if the block size is below the page size used to to build
> > > > the SGL.
> > > >
> > > > The rules for umem SGL construction require that the SG's all be PAGE_SIZE
> > > > aligned and we don't encode the actual byte offset of the VA range inside
> > > > the SGL using offset and length. So rdma_for_each_block() has no idea
> > > > where the actual starting/ending point is to compute the first/last block
> > > > boundary if the starting address should be within a SGL.
> > > >
> > > > Fixing the SGL construction turns out to be really hard, and will be the
> > > > subject of other patches. For now block smaller pages.
> > > >
> > > > Fixes: 4a35339958f1 ("RDMA/umem: Add API to find best driver supported page size in an MR")
> > > > Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxx>
> > > >  drivers/infiniband/core/umem.c | 6 ++++++
> > > >  1 file changed, 6 insertions(+)
> > > >
> > > > diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
> > > > index 120e98403c345d..7b5bc969e55630 100644
> > > > +++ b/drivers/infiniband/core/umem.c
> > > > @@ -151,6 +151,12 @@ unsigned long ib_umem_find_best_pgsz(struct ib_umem *umem,
> > > >  	dma_addr_t mask;
> > > >  	int i;
> > > >
> > > > +	/* rdma_for_each_block() has a bug if the page size is smaller than the
> > > > +	 * page size used to build the umem. For now prevent smaller page sizes
> > > > +	 * from being returned.
> > > > +	 */
> > > > +	pgsz_bitmap &= GENMASK(BITS_PER_LONG - 1, PAGE_SHIFT);
> > > > +
> > >
> > > Why do we care about such case? Why can't we leave this check forever?
> >
> > If HW supports only, say 4k page size, and runs on a 64k page size
> > architecture it should be able to fragment into the native HW page
> > size.
> >
> > The whole point of these APIs is to decouple the system and HW page
> > sizes.
> 
> Right now you are preventing such combinations, but is this real concern
> for existing drivers?

No, I didn't prevent anything, I've left those drivers just hardwired
to use PAGE_SHIFT/PAGE_SIZE.

Maybe they are broken and malfunction on 64k page size systems, maybe
the HW supports other pages sizes and they should call
ib_umem_find_best_pgsz(), I don't really know.

The fix is fairly trivial, but it can't be done until the drivers stop
touching umem->sgl - as it requires changing how the sgl is
constructed to match standard kernel expectations, which also breaks
all the drivers.

Jason



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux