On Tue, Jun 22, 2021 at 04:42:49PM +0000, Bernard Metzler wrote: > -----ira.weiny@xxxxxxxxx wrote: ----- > > >To: "Jason Gunthorpe" <jgg@xxxxxxxx> > >From: ira.weiny@xxxxxxxxx > >Date: 06/22/2021 08:14AM > >Cc: "Ira Weiny" <ira.weiny@xxxxxxxxx>, "Mike Marciniszyn" > ><mike.marciniszyn@xxxxxxxxxxxxxxxxxxxx>, "Dennis Dalessandro" > ><dennis.dalessandro@xxxxxxxxxxxxxxxxxxxx>, "Doug Ledford" > ><dledford@xxxxxxxxxx>, "Faisal Latif" <faisal.latif@xxxxxxxxx>, > >"Shiraz Saleem" <shiraz.saleem@xxxxxxxxx>, "Bernard Metzler" > ><bmt@xxxxxxxxxxxxxx>, "Kamal Heib" <kheib@xxxxxxxxxx>, > >linux-rdma@xxxxxxxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx > >Subject: [EXTERNAL] [PATCH 4/4] RDMA/siw: Convert siw_tx_hdt() to > >kmap_local_page() > > > >From: Ira Weiny <ira.weiny@xxxxxxxxx> > > > >kmap() is being deprecated and will break uses of device dax after > >PKS > >protection is introduced.[1] > > > >The use of kmap() in siw_tx_hdt() is all thread local therefore > >kmap_local_page() is a sufficient replacement and will work with > >pgmap > >protected pages when those are implemented. > > > >kmap_local_page() mappings are tracked in a stack and must be > >unmapped > >in the opposite order they were mapped in. > > > >siw_tx_hdt() tracks pages used in a page_array. It uses that array > >to > >unmap pages which were mapped on function exit. Not all entries in > >the > >array are mapped and this is tracked in kmap_mask. > > > >kunmap_local() takes a mapped address rather than a page. Declare a > >mapped address array, page_array_addr, of the same size as the page > >array to be used for unmapping. > > > > Hi Ira, thanks for taking care of that! > > I think we can avoid introducing another 'page_array_addr[]' array > here, which must be zeroed first and completely searched for > valid mappings during unmap, and also further bloats the > stack size of siw_tx_hdt(). I think we can go away with the > already available iov[].iov_base addresses array, masking addresses > with PAGE_MASK during unmapping to mask any first byte offset. > All kmap_local_page() mapping end up at that list. For unmapping > we can still rely on the kmap_mask bit field, which is more > efficient to initialize and search for valid mappings. Ordering > during unmapping can be guaranteed if we parse the bitmask > in reverse order. Let me know if you prefer me to propose > a change -- that siw_tx_hdt() thing became rather complex I > have to admit! Seems not too bad, V2 sent. I was concerned with the additional stack size but only 28 pointers (If I did my math right) did not seem too bad. It is redundant though so lets see if I've gotten V2 right. Thanks! Ira > > Best, > Bernard. > > >Use kmap_local_page() instead of kmap() to map pages in the > >page_array. > > > >Because segments are mapped into the page array in increasing index > >order, modify siw_unmap_pages() to unmap pages in decreasing order. > > > >The kmap_mask is no longer needed as the lack of an address in the > >address array can indicate no unmap is required. > > > >[1] > >INVALID URI REMOVED > >lkml_20201009195033.3208459-2D59-2Dira.weiny-40intel.com_&d=DwIDAg&c= > >jf_iaSHvJObTbx-siA1ZOg&r=2TaYXQ0T-r8ZO1PP1alNwU_QJcRRLfmYTAgd3QCvqSc& > >m=wnRcc-qyXV_X7kyQfFYL6XPgmmakQxmo44BmjIon-w0&s=Y0aiKJ4EHZY8FJlI-uiPr > >xcBE95kmgn3iEz3p8d5VF4&e= > > > >Signed-off-by: Ira Weiny <ira.weiny@xxxxxxxxx> > >--- > > drivers/infiniband/sw/siw/siw_qp_tx.c | 35 > >+++++++++++++++------------ > > 1 file changed, 20 insertions(+), 15 deletions(-) > > > >diff --git a/drivers/infiniband/sw/siw/siw_qp_tx.c > >b/drivers/infiniband/sw/siw/siw_qp_tx.c > >index db68a10d12cd..e70aba23f6e7 100644 > >--- a/drivers/infiniband/sw/siw/siw_qp_tx.c > >+++ b/drivers/infiniband/sw/siw/siw_qp_tx.c > >@@ -396,13 +396,17 @@ static int siw_0copy_tx(struct socket *s, > >struct page **page, > > > > #define MAX_TRAILER (MPA_CRC_SIZE + 4) > > > >-static void siw_unmap_pages(struct page **pp, unsigned long > >kmap_mask) > >+static void siw_unmap_pages(void **addrs, int len) > > { > >- while (kmap_mask) { > >- if (kmap_mask & BIT(0)) > >- kunmap(*pp); > >- pp++; > >- kmap_mask >>= 1; > >+ int i; > >+ > >+ /* > >+ * Work backwards through the array to honor the kmap_local_page() > >+ * ordering requirements. > >+ */ > >+ for (i = (len-1); i >= 0; i--) { > >+ if (addrs[i]) > >+ kunmap_local(addrs[i]); > > } > > } > > > >@@ -427,13 +431,15 @@ static int siw_tx_hdt(struct siw_iwarp_tx > >*c_tx, struct socket *s) > > struct siw_sge *sge = &wqe->sqe.sge[c_tx->sge_idx]; > > struct kvec iov[MAX_ARRAY]; > > struct page *page_array[MAX_ARRAY]; > >+ void *page_array_addr[MAX_ARRAY]; > > struct msghdr msg = { .msg_flags = MSG_DONTWAIT | MSG_EOR }; > > > > int seg = 0, do_crc = c_tx->do_crc, is_kva = 0, rv; > > unsigned int data_len = c_tx->bytes_unsent, hdr_len = 0, trl_len = > >0, > > sge_off = c_tx->sge_off, sge_idx = c_tx->sge_idx, > > pbl_idx = c_tx->pbl_idx; > >- unsigned long kmap_mask = 0L; > >+ > >+ memset(page_array_addr, 0, sizeof(page_array_addr)); > > > > if (c_tx->state == SIW_SEND_HDR) { > > if (c_tx->use_sendpage) { > >@@ -498,7 +504,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx, > >struct socket *s) > > p = siw_get_upage(mem->umem, > > sge->laddr + sge_off); > > if (unlikely(!p)) { > >- siw_unmap_pages(page_array, kmap_mask); > >+ siw_unmap_pages(page_array_addr, MAX_ARRAY); > > wqe->processed -= c_tx->bytes_unsent; > > rv = -EFAULT; > > goto done_crc; > >@@ -506,11 +512,10 @@ static int siw_tx_hdt(struct siw_iwarp_tx > >*c_tx, struct socket *s) > > page_array[seg] = p; > > > > if (!c_tx->use_sendpage) { > >- iov[seg].iov_base = kmap(p) + fp_off; > >- iov[seg].iov_len = plen; > >+ page_array_addr[seg] = kmap_local_page(page_array[seg]); > > > >- /* Remember for later kunmap() */ > >- kmap_mask |= BIT(seg); > >+ iov[seg].iov_base = page_array_addr[seg] + fp_off; > >+ iov[seg].iov_len = plen; > > > > if (do_crc) > > crypto_shash_update( > >@@ -518,7 +523,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx, > >struct socket *s) > > iov[seg].iov_base, > > plen); > > } else if (do_crc) { > >- kaddr = kmap_local_page(p); > >+ kaddr = kmap_local_page(page_array[seg]); > > crypto_shash_update(c_tx->mpa_crc_hd, > > kaddr + fp_off, > > plen); > >@@ -542,7 +547,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx, > >struct socket *s) > > > > if (++seg > (int)MAX_ARRAY) { > > siw_dbg_qp(tx_qp(c_tx), "to many fragments\n"); > >- siw_unmap_pages(page_array, kmap_mask); > >+ siw_unmap_pages(page_array_addr, MAX_ARRAY); > > wqe->processed -= c_tx->bytes_unsent; > > rv = -EMSGSIZE; > > goto done_crc; > >@@ -593,7 +598,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx, > >struct socket *s) > > } else { > > rv = kernel_sendmsg(s, &msg, iov, seg + 1, > > hdr_len + data_len + trl_len); > >- siw_unmap_pages(page_array, kmap_mask); > >+ siw_unmap_pages(page_array_addr, MAX_ARRAY); > > } > > if (rv < (int)hdr_len) { > > /* Not even complete hdr pushed or negative rv */ > >-- > >2.28.0.rc0.12.gb6a658bd00c9 > > > > >