On 05-Mar-19 22:15, Jason Gunthorpe wrote: > On Tue, Mar 05, 2019 at 03:14:34PM +0200, Gal Pressman wrote: >>>>>>>> + sg_set_page(&sglist[i], pg, EFA_PAGE_SIZE, 0); >>>>>>>> + buf = (u64 *)((u8 *)buf + EFA_PAGE_SIZE); >>>>> >>>>> Why do you need special EFA_PAGE_SIZE? Isn't PAGE_SIZE enough for you? >>>> >>>> EFA_PAGE_SIZE represents the device page size. >>> >>> Something is wrong here then, as vmalloc_to_page returns something >>> that is *at most* PAGE_SIZE long. You can't go around and pass that >>> into sg_set_page with EFA_PAGE_SIZE > PAGE_SIZE. >>> >>> Maybe this code is wrongly assuming PAGE_SIZE > EFA_PAGE_SIZE >>> ? >> >> We do assume PAGE_SIZE >= EFA_PAGE_SIZE. >> For instance, on systems where PAGE_SIZE is 64k, we still use chunks of 4k. > > Hurm, BUILD_BUG_ON that, I think. Will do. Also, looks like there's a bug in sg_set_page(), the offset shouldn't be zero but offset_in_page(buf). > >>> What you need here is to make the scatter list out of PAGE_SIZE blocks >>> and then use Shiraz's work to fragment it into EFA_PAGE_SIZE blocks >>> for building your pbl. dma_map_sg should use the largest sgls possible >>> for efficiency. >>> >>> Also, I think there are now several places in the kernel converting >>> vmalloc/kmalloc regions into scatterlist tables, it probably warrents >>> adding a shared helper in lib/scatterlist.c that does a good and >>> proper job of this. >> >> My search yielded six occurrences, all under drivers/media. >> Does this justify moving this function to scatterlist.c? > > I wrote one under drivers/fpga a few years ago too .. > > See fpga_mgr_buf_load() - this handles vmap and kmap transparently, so > it somewhat more general. > > I think 7 is certainly enough to warrant common code. Looks like some of the drivers operate on the scatterlist itself (no chaining), while others (such as the one you cited) operate on a sg table. I wonder if we should convert all to use scatterlist or sg tables? Also, EFA is special in the sense that all sg functions use PAGE_SIZE while EFA needs EFA_PAGE_SIZE. I guess it could be solved by passing that as a parameter, but that'll probably require some changes in existing API functions (__sg_alloc_table_from_pages for example)? Would you consider keeping efa_vmalloc_buf_to_sg() as a temporary solution? The scope of this change is a bit bigger than just RDMA and I'd like to split these changes.