On Fri, Mar 21, 2014 at 03:20:25PM -0700, Darrick J. Wong wrote: > On Fri, Mar 21, 2014 at 11:23:32AM -0700, Zach Brown wrote: > > On Thu, Mar 20, 2014 at 09:30:41PM -0700, Darrick J. Wong wrote: > > > This RFC provides a rough implementation of a mechanism to allow > > > userspace to attach protection information (e.g. T10 DIF) data to a > > > disk write and to receive the information alongside a disk read. The > > > interface is an extension to the AIO interface: two new commands > > > (IOCB_CMD_P{READ,WRITE}VM) are provided. The last struct iovec in the > > > arg list is interpreted to point to a buffer containing a header, > > > followed by the the PI data. > > > > Instead of adding commands that indicate that the final element is a > > magical pi buffer, why not expand the iocb? > > > > In the user iocb, a bit in aio_flags could indicate that aio_reserved2 > > is a pointer to an extension of the iocb. In that extension could be a > > full iov *, nr_segs for PI data. > > > > You'd then translate that into a bigger kernel kiocb with a specific > > pointer to PI data rather than having to bubble the tests for this magic > > final iovec down through the kernel. > > > > + if (iocb->ki_flags & KIOCB_USE_PI) { > > + nr_segs--; > > + pi_iov = (struct iovec *)(iov + nr_segs); > > + } > > > > I suggest this because there's already pressure to extend the iocb. > > Folks want io priority inputs, completion time outputs, etc. > > I'm curious about the reqprio field -- it seems like it was put there to > request some kind of IO priority change, but the kernel doesn't use it. The user-facing iocbs were derived from the posix aio interface which has a reqprio field (aio(7), aio_reqprio). I don't think anything's ever been done with it. I don't know more about what current io prio stuff people might want to specify.. ioprio_set(2) args instead of having to bounce through syscalls and current-> for each op? cgroup bits? No idea. > If aio_reserved2 becomes a (flag-guarded) pointer to an array of aio > extensions, I'd be tempted to reuse the reqprio to signal the length of the > extension array, and if anyone wants to start using reqprio, they could add it > as an extension. I'll admit, I'm hesitant to cannibalize reqprio for this. It's a lame s16. But maybe it'll be the least awful alternative. > (More about this in my response to Ben LaHaise.) (I'll go reply over there too.) > > And heck, on the sync rw syscall side, add variant that have a pointer > > to this same extension struct. There's nothing inherently aio specific > > about having lots more per-io inputs and outputs. > > I'm curious -- what kinds of extensions do you envision for sync()? Sorry, that was poorly worded. By 'sync' I meant the synchronous classic sys_*write* syscalls. Maybe we should add another variant with a "struct io_goo *" pointer, or whatever. - z -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html