On Wed, 12 Mar 2014 17:18:32 -0400 Jeff Layton <jlayton@xxxxxxxxxx> wrote: > On Wed, 12 Mar 2014 15:43:10 -0500 > Tom Tucker <tom@xxxxxxxxxxxxxxxxxxxxx> wrote: > > > Hi Jeff, > > > > > > On 3/12/14 9:51 AM, Jeff Layton wrote: > > > The xdr_off value in dma_map_xdr gets passed to ib_dma_map_page as the > > > offset into the page to be mapped. For the case of the pages array, the > > > existing calculation always seems to end up at 0, which causes data > > > corruption when a non-page-aligned READ request comes in. The server ends > > > up doing the RDMA_WRITE from the wrong part of the page. > > > > > > Override the xdr_off value with the page_base in that situation to fix > > > the issue. Obviously, this method can't contend with a page_base that is > > > larger than PAGE_SIZE. > > I think we saw page base > PAGE_SIZE when inter-operating with Sun, but > > maybe someone else on the list has a clearer recollection. > > > > Ok, I may need to rejigger that logic to account for that case. I'll do > that and send a v2 once I test it out... > Self-NAK on this patch... page_base I think is not so much a problem as the fact that we can get a READ request that is not page-aligned and that spans multiple pages in the array. For instance, if you do (pseudocode): fd = open("/file/on/nfsordma", O_RDONLY|O_DIRECT); lseek(fd, 1024, SEEK_SET); read(fd, buf, PAGE_SIZE); ...then you get different data than if you didn't use O_DIRECT. I think we could probably cook up a similar test using pynfs, fwiw... The existing code doesn't handle that correctly, and neither does the code after my patch. Fixing this correctly doesn't look trivial though, as I think we'll have to push some awareness of the alignment of the pages array farther up the call chain. > > I am curious, however, why xdr_off was always zero if in fact, page_base > > was not page aligned. > > > > Mostly I was testing with sub-page sized READ requests (which are of > course done via RDMA_WRITE). Assume we have a READ request for 512 > bytes. > > On the first pass into send_write, we call dma_map_xdr to map the > head. On that call xdr_off is 0. > > On the next pass, we go to send the page data. At that point, > xdr_off == head[0].iov_len: > > -------------------8<---------------------- > } else { > xdr_off -= xdr->head[0].iov_len; > if (xdr_off < xdr->page_len) { > /* This offset is in the page list */ > page = xdr->pages[xdr_off >> PAGE_SHIFT]; > xdr_off &= ~PAGE_MASK; > } else { > -------------------8<---------------------- > > ...so we subtract xdr->head[0].iov_len, and xdr_off is now 0. > > The problem is that you can't determine where in the pages array the > data to send actually starts just from the xdr_off. That just tells you > how far into the xdr buffer you are. If the data you actually want > isn't aligned to the start of the page (xdr->page_base !=0) then you > can't tell from that. > > That info is tracked inside the xdr->page_base, so we need use that to > determine what the mapping offset should be. > -- Jeff Layton <jlayton@xxxxxxxxxx> -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html