From: Boqun Feng <boqun.feng@xxxxxxxxx> Sent: Saturday, September 5, 2020 7:15 AM > > On Sat, Sep 05, 2020 at 02:55:48AM +0000, Michael Kelley wrote: > > From: Boqun Feng <boqun.feng@xxxxxxxxx> Sent: Tuesday, September 1, 2020 8:01 PM > > > > > > Hyper-V always use 4k page size (HV_HYP_PAGE_SIZE), so when > > > communicating with Hyper-V, a guest should always use HV_HYP_PAGE_SIZE > > > as the unit for page related data. For storvsc, the data is > > > vmbus_packet_mpb_array. And since in scsi_cmnd, sglist of pages (in unit > > > of PAGE_SIZE) is used, we need convert pages in the sglist of scsi_cmnd > > > into Hyper-V pages in vmbus_packet_mpb_array. > > > > > > This patch does the conversion by dividing pages in sglist into Hyper-V > > > pages, offset and indexes in vmbus_packet_mpb_array are recalculated > > > accordingly. > > > > > > Signed-off-by: Boqun Feng <boqun.feng@xxxxxxxxx> > > > --- > > > drivers/scsi/storvsc_drv.c | 60 ++++++++++++++++++++++++++++++++++---- > > > 1 file changed, 54 insertions(+), 6 deletions(-) > > > > > > diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c > > > index 8f5f5dc863a4..3f6610717d4e 100644 > > > --- a/drivers/scsi/storvsc_drv.c > > > +++ b/drivers/scsi/storvsc_drv.c > > > @@ -1739,23 +1739,71 @@ static int storvsc_queuecommand(struct Scsi_Host *host, > struct > > > scsi_cmnd *scmnd) > > > payload_sz = sizeof(cmd_request->mpb); > > > > > > if (sg_count) { > > > - if (sg_count > MAX_PAGE_BUFFER_COUNT) { > > > + unsigned int hvpg_idx = 0; > > > + unsigned int j = 0; > > > + unsigned long hvpg_offset = sgl->offset & ~HV_HYP_PAGE_MASK; > > > + unsigned int hvpg_count = HVPFN_UP(hvpg_offset + length); > > > > > > - payload_sz = (sg_count * sizeof(u64) + > > > + if (hvpg_count > MAX_PAGE_BUFFER_COUNT) { > > > + > > > + payload_sz = (hvpg_count * sizeof(u64) + > > > sizeof(struct vmbus_packet_mpb_array)); > > > payload = kzalloc(payload_sz, GFP_ATOMIC); > > > if (!payload) > > > return SCSI_MLQUEUE_DEVICE_BUSY; > > > } > > > > > > + /* > > > + * sgl is a list of PAGEs, and payload->range.pfn_array > > > + * expects the page number in the unit of HV_HYP_PAGE_SIZE (the > > > + * page size that Hyper-V uses, so here we need to divide PAGEs > > > + * into HV_HYP_PAGE in case that PAGE_SIZE > HV_HYP_PAGE_SIZE. > > > + */ > > > payload->range.len = length; > > > - payload->range.offset = sgl[0].offset; > > > + payload->range.offset = sgl[0].offset & ~HV_HYP_PAGE_MASK; > > > + hvpg_idx = sgl[0].offset >> HV_HYP_PAGE_SHIFT; > > > > > > cur_sgl = sgl; > > > - for (i = 0; i < sg_count; i++) { > > > - payload->range.pfn_array[i] = > > > - page_to_pfn(sg_page((cur_sgl))); > > > + for (i = 0, j = 0; i < sg_count; i++) { > > > + /* > > > + * "PAGE_SIZE / HV_HYP_PAGE_SIZE - hvpg_idx" is the # > > > + * of HV_HYP_PAGEs in the current PAGE. > > > + * > > > + * "hvpg_count - j" is the # of unhandled HV_HYP_PAGEs. > > > + * > > > + * As shown in the following, the minimal of both is > > > + * the # of HV_HYP_PAGEs, we need to handle in this > > > + * PAGE. > > > + * > > > + * |------------------ PAGE ----------------------| > > > + * | PAGE_SIZE / HV_HYP_PAGE_SIZE in total | > > > + * |hvpg|hvpg| ... |hvpg|... |hvpg| > > > + * ^ ^ > > > + * hvpg_idx | > > > + * ^ | > > > + * +---(hvpg_count - j)--+ > > > + * > > > + * or > > > + * > > > + * |------------------ PAGE ----------------------| > > > + * | PAGE_SIZE / HV_HYP_PAGE_SIZE in total | > > > + * |hvpg|hvpg| ... |hvpg|... |hvpg| > > > + * ^ ^ > > > + * hvpg_idx | > > > + * ^ | > > > + * +---(hvpg_count - j)------------------------+ > > > + */ > > > + unsigned int nr_hvpg = min((unsigned int)(PAGE_SIZE / > HV_HYP_PAGE_SIZE) - hvpg_idx, > > > + hvpg_count - j); > > > + unsigned int k; > > > + > > > + for (k = 0; k < nr_hvpg; k++) { > > > + payload->range.pfn_array[j] = > > > + page_to_hvpfn(sg_page((cur_sgl))) + hvpg_idx + k; > > > + j++; > > > + } > > > cur_sgl = sg_next(cur_sgl); > > > + hvpg_idx = 0; > > > } > > > > This code works; I don't see any errors. But I think it can be made simpler based > > on doing two things: > > 1) Rather than iterating over the sg_count, and having to calculate nr_hvpg on > > each iteration, base the exit decision on having filled up the pfn_array[]. You've > > already calculated the exact size of the array that is needed given the data > > length, so it's easy to exit when the array is full. > > 2) In the inner loop, iterate from hvpg_idx to PAGE_SIZE/HV_HYP_PAGE_SIZE > > rather than from 0 to a calculated value. > > > > Also, as an optimization, pull page_to_hvpfn(sg_page((cur_sgl)) out of the > > inner loop. > > > > I think this code does it (though I haven't tested it): > > > > for (j = 0; ; sgl = sg_next(sgl)) { > > unsigned int k; > > unsigned long pfn; > > > > pfn = page_to_hvpfn(sg_page(sgl)); > > for (k = hvpg_idx; k < (unsigned int)(PAGE_SIZE /HV_HYP_PAGE_SIZE); k++) { > > payload->range.pfn_array[j] = pfn + k; > > if (++j == hvpg_count) > > goto done; > > } > > hvpg_idx = 0; > > } > > done: > > > > This approach also makes the limit of the inner loop a constant, and that > > constant will be 1 when page size is 4K. So the compiler should be able to > > optimize away the loop in that case. > > > > Good point! I like your suggestion, and after thinking a bit harder > based on your approach, I come up with the following: > > #define HV_HYP_PAGES_IN_PAGE ((unsigned int)(PAGE_SIZE / HV_HYP_PAGE_SIZE)) > > for (j = 0; j < hvpg_count; j++) { > unsigned int k = (j + hvpg_idx) % HV_HYP_PAGES_IN_PAGE; > > /* > * Two cases that we need to fetch a page: > * a) j == 0: the first step or > * b) k == 0: when we reach the boundary of a > * page. > * > if (k == 0 || j == 0) { > pfn = page_to_hvpfn(sg_page(cur_sgl)); > cur_sgl = sg_next(cur_sgl); > } > > payload->range.pfn_arrary[j] = pfn + k; > } > > , given the HV_HYP_PAGES_IN_PAGE is always a power of 2, so I think > compilers could easily optimize the "%" into bit masking operation. And > when HV_HYP_PAGES_IN_PAGE is 1, I think compilers can easily figure out > k is always zero, then the if-statement can be optimized as always > taken. And that gives us the same code as before ;-) > > Thoughts? I will try with a test to see if I'm missing something subtle. > > Thanks for looking into this! > Your newest version looks right to me -- very clever! I like it even better than my version. Michael