On 02/07/2013 02:08 PM, Boaz Harrosh wrote: > On 02/07/2013 01:27 PM, Hannes Reinecke wrote: >> On 02/07/2013 11:01 AM, Darrick J. Wong wrote: >>> On Thu, Feb 07, 2013 at 01:40:14AM -0800, Joel Becker wrote: >>>> On Wed, Feb 06, 2013 at 03:34:49PM -0500, Chuck Lever wrote: >>>>> >>>>> On Feb 6, 2013, at 3:24 PM, "Darrick J. Wong" <darrick.wong@xxxxxxxxxx> wrote: >>>>> >>>>>> On Wed, Feb 06, 2013 at 01:51:22PM -0600, Ben Myers wrote: >>>>>>> Hi, >>>>>>> >>>>>>> I'm interested in discussing how to pass protection information to and from >>>>>>> userspace. Maybe Martin could be enlisted for the discussion. >>>>>>> >>>>>>> I read that some work has already been done in this area but have not been able >>>>>>> to locate it. It looks like the bio-integrity code already makes it possible >>>>>>> to generate the t10-dif crc in the filesystem. It would be good to be able to >>>>>>> get the guard and application tags back out to backup applications such as >>>>>>> xfsdump. Enabling other applications to generate their own tags in userspace >>>>>>> is also interesting. >>>>>> >>>>>> This one's been on my list for a couple of years (and companies) too. A few >>>>>> years ago Joel Becker had support for it in his sys_dio proposal (that hasn't >>>>>> gone anywhere), and more recently I've theorized that we could add a magic >>>>>> fcntl/ioctl to make the kernel recognize, say, the first iovec of a O_DIRECT >>>>>> *{read,write}v call as the PI buffer, which I think is similar to how DIX gets >>>>>> PI data to a disk. But it's not like I have any code to show for it. >>>>>> >>>>>> I /think/ it's fairly straightforward to change the directio submit code to >>>>>> find the userspace PI buffer and amend the block integrity code to attach our >>>>>> own PI buffer. You'd still have to let the block layer set the sector # field, >>>>>> but afaik that won't affect the crc or the app tag. >>>>>> >>>>>> I hear that the NFS guys want to propose some sort of protocol for transmitting >>>>>> PI data (across NFS), but I haven't seen anything concrete yet. >>>>> >>>>> I'm writing a requirements document for the NFS protocol which I can discuss at LSF. The use cases for NFS for now would be virtual disk devices (hypervisors) or direct NFS access to storage from user space. >>>>> >>>>> Like everyone else we are waiting for a magical VFS and user space API to appear that can pass PI to and from storage. >>>> >>>> I'm happy to chat about it. Unfortunately, like Darrick says, sys_dio() >>>> coding hasn't happened. I do think we're better off with some kind of >>>> explicit API than some magic state on the file. I mean, even something >>>> like: >>>> >>>> ssize_t write_with_pi(int fd, const void *buf, size_t count, >>>> const void *pi, size_t pi_count); >>>> >>>> It's not as nice as a non-historical API (eg sys_dio), but it also >>>> probably plays nicer with buffered I/O. >>> >>> I also pondered simply adding a new io_prep_* function + IO_CMD_ code to libaio >>> and all the other plumbing necessary to make that happen... >>> >>> void io_prep_preadv_pi(struct iocb *iocb, int fd, const struct iovec *iov, >>> int iovcnt, long long offset, const void *pi, >>> size_t pi_count); >>> >> This is also what I've envisioned. >> Updating io_prep / async I/O is reasonably easy as its been using a >> separate structure for passing in the I/O details. >> >> Normal read/write calls don't really map as you simply don't have >> enough parameter to feed PI information into the kernel. >> So for that you'd need to invent a new interface / syscall. >> >> For aio we just need to add additional fields to an existing structure. >> >> So yeah, I'd be interested in that discussion as well. >> > > Me too, in multiple fronts. It's part of my general concern about > "things we would like for user-mode servers" > > I think that the current aio and libaio Interface is broken for a long > time, for multitude of reasons. For instance the nested structure definitions > are COMPAT broken, and lots of missing pieces. (For example search in archives > for why bsg does not support sg-lists.) > > And there are all these additions that everyone wants on top, that call for > a new interface anyway. > > So I would like to see a deep fixup of this interface, with an aio version2 > that can take into considerations, all of future needs including these > above. Kernel code will be very happy to be implemented with the new, interface > and a COMPAT layer could be put in place for the old interface. > > All interested parties should bring to the table what is the extension/changes > they need. And we can try and union all of them together. > > (My addition is for support of sg_lists to bsg, in a way that makes Tomo happy > I know that qemu was wanting this for a while as well as the multitude of > user-mode servers) > I wanted to add that there is another LSF/MM thread going on about: "[LSF TOPIC] What to do about O_DIRECT?" All these guys should be participating here, so to change core structures and behavior to a better model, that helps us here, and not against us. (Again libaio should be changed in concert with Kernel's new API, and we can sacrifice old user-mode performance, with a COMPAT layer. Distro maintainers should consider replacing libaio, together with the new Kernel, so it is only those that do their own mix-and-match, who can fix that mismatch too) > Thanks > Boaz > >> Cheers, >> >> Hannes >> > -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html