Eric Van Hensbergen wrote: > On Tue, Aug 17, 2010 at 6:31 PM, Venkateswararao Jujjuri (JV) > <jvrao@xxxxxxxxxxxxxxxxxx> wrote: >>>> Additionally, given we know the page size constant, couldn't we infer >>>> this from a negotiated msize? Transports already have an maxsize >>>> field which limits the msize selections (and defaults? if not maybe it >>>> should?) -- why not just use that? >> The IO size in this mode is derived by virtio ring size rather than the msize. >> msize is restricted to the maximum amount of data that gets copied on the the >> kmalloc'd buffer. >> In this case it is just the header. >> > > Actually, msize is supposed to be the maximum amount of data that gets > copied over the transport which should be a factor determined by the > client and server and the underlying technology which links them. > From that aspect I think its a perfectly valid way to set this up and > is already part of the mechanisms. Our intention of this patch was to achieve zero copy for reads/writes with minimal changes, not affecting other transports and other vfs transactions. In this method, if the transport has capability to send user pages directly, we are using/treating the msize as the maximum allowed size for header, and the actual payload is limited/controlled by the virtio ring size. With these minor set of changes, we achieved the zero copy on the most important code path, read/write...without any changes to other sections of the code. This could be a step in the right direction, in the future when we are ready to modify other code paths like readdir, xattr we can take more broader set of changes. Currently we are using 8k as the msize...kmalloc() on anything beyond 1 page(4k) may fail under high load.. so we may have to revisit our entire logic somewhere down the line on how do we segregate header from the actual payload. > >> I think moving the #of pages that can be accommodated onto the virtio ring to >> the p9_client, >> we can use it along with the page_size to determine how much data we are passing. >> With this we can tune our calculations on the read/write side .. so that we can >> derive better >> logic to handle short reads. >> > > I don't know if I saw such a use in the existing patch. Can you give > me a concrete use case where we would use multiple "short" pages > instead of mostly full pages and potentially a single short page at > the end? I am talking about short reads/writes not short pages. Yes at the most you will have two partial pages. (un-aligned start, and may be last page) The case where the request into VFS is more than the actual read/write we could do either because of transport restriction/msize .. or something else. Say, if the read size of 512k comes into vfs, and we could put only 480k into one request because of virtio ring limitation (128-8(header) * 4k). At VFS/client level we need to retry the read for the rest of the IO..before we return back to the user. And this change is not in the patch yet.. I am proposing to have it in in my next version. > > -eric -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html