On 2011-06-10 10:09, tao.peng@xxxxxxx wrote: > Hi, Benny, > > -----Original Message----- > From: Benny Halevy [mailto:benny@xxxxxxxxxx] > Sent: Friday, June 10, 2011 8:33 PM > To: Peng, Tao > Cc: bergwolf@xxxxxxxxx; rees@xxxxxxxxx; linux-nfs@xxxxxxxxxxxxxxx; honey@xxxxxxxxxxxxxx > Subject: Re: [PATCH 87/88] Add configurable prefetch size for layoutget > > On 2011-06-10 02:00, tao.peng@xxxxxxx wrote: >> Hi, Benny, >> >> Cheers, >> -Bergwolf >> >> >> -----Original Message----- >> From: linux-nfs-owner@xxxxxxxxxxxxxxx [mailto:linux-nfs-owner@xxxxxxxxxxxxxxx] On Behalf Of Benny Halevy >> Sent: Friday, June 10, 2011 5:23 AM >> To: Peng Tao >> Cc: Jim Rees; linux-nfs@xxxxxxxxxxxxxxx; peter honeyman >> Subject: Re: [PATCH 87/88] Add configurable prefetch size for layoutget >> >> On 2011-06-09 08:07, Peng Tao wrote: >>> Hi, Jim and Benny, >>> >>> On Thu, Jun 9, 2011 at 9:58 PM, Jim Rees <rees@xxxxxxxxx> wrote: >>>> Benny Halevy wrote: >>>> >>>> > My understanding is that layoutget specifies a min and max, and the server >>>> >>>> There's a min. What do you consider the max? >>>> Whatever gets into csa_fore_chan_attrs.ca_maxresponsesize? >>>> >>>> The spec doesn't say max, it says "desired." I guess I assumed the server >>>> wouldn't normally return more than desired. >>> In fact server is returning "desired" length. The problem is that we >>> call pnfs_update_layout in nfs_write_begin, and it will end up setting >>> both minlength and length to page size. There is no space for client >>> to collapse layoutget range in nfs_write_begin. >>> >> >> That's a different issue. Waiting with pnfs_update_layout to flush >> time rather than write_begin if the whole page is written would help >> sending a more meaningful desired range as well as avoiding needless >> read-modify-writes in case the application also wrote the whole >> preallocated block. >> [PT] It is also the reason why we want to introduce layout prefetching, to get more segment than the page passed in nfs_write_begin. >> > > Peng, I understand what you want to achieve but the proposed way > just doesn't fly. The server knows better than the client its allocation policies > and it knows better the combined workload of different client and possible > conflicts between them therefore it should be making the ultimate decision > about the actual segment sizes. > [PT] Yes, you are right. Server should know combined workload of all clients and make its decision based on that. > And it always has the right to return more than (or less than) specified in loga_length. > > That said, the client should indeed do its best to ask for the most appropriate > segments size for its use and we should be making a better job at that. > It's just that blindly asking for more is not a good strategy and requiring > manual admin help to tune the clients is not acceptable. > [PT] yeah, determing the most appropriate is always the hart part. Do you have any suggestions to that? A simple algorithm I can suggest is: - on initialization, calculate and save, per layout driver - maximum layout size - take into account csr_fore_chan_attrs.ca_maxresponsesize and possible other parameters - keep a working copy of the maximum value and the calculated copy. - alignment value. - on miss, see if there's an adjacent layout segment in cache - if found, ask for twice the found segment size, up to the maximum value, aligned on the alignment value. - if the server returns less the layoutget range, keep note of the returned length (but not adjust maximum yet, as the server may return a short segment for various reasons) - if the server is consistent about returning less than was asked, adjust the - working copy of the maximum length - if the maximum was adjusted try bumping it up after X (TBD) layoutgets or T seconds to see if that was just due to high load or conflicts on the server - on any error returned for LAYOUTGET reset the algorithm parameters - on session reestablishment recalculate maximums. Benny > > Thanks, > Tao -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html