On Thu, 2014-01-16 at 17:58 -0800, Darrick J. Wong wrote: > Hello LSF committee, > <SNIP> > I also have my own topic -- implementing a userland interface for > passing integrity metadata through to the storage. This is the usage > model that I'd set up in my head (at the kernel<->userland boundary): > > 1. Program opens a file descriptor. > > 2. Program sets up a aio context. > > 3. Program queries the fd for supported PI profiles (probably an > ioctl). > > 4. Program uses newly defined IO_CMD_{READV,WRITEV}_PI commands > to supply PI data and verify the data itself. A new structure > will be defined to report the PI profile the app wants to use, > which fields the app is actually interested in providing or > verifying, a bitset of which devices should check the PI data (HBA, > disk, intermediate storage servers), and followed by space for the > actual PI data; then either we find space in struct iocb to point > to this buffer, or we do something naughty such as attaching it as > the first (or last) iovec pointer. > > libaio can take care of all this for a client program. A separate > discussion could be had about the interface from libaio to client > programs, but let's get the kernel<->user piece done first. > > 5. Error codes ... perhaps we define a IO_CMD_GET_ERROR command that > doesn't return an event until it has extended error data to > supply. This could be more than just PI failures -- SCSI sense > data seems like a potential choice. This is a stretch goal... > > The raw kernel interface of course would be passing PI profiles and > data to userspace, for anyone who wishes to bypass libaio. > > As for ioctl that describes what kind of PI data the kernel will > accept, I'd like to make it generic enough that someone could > implement a device with any kind of 'checksum' (crc32c, sha1, or maybe > even a cryptographic signature), while allowing for different > geometrical requirements, or none, as in the case of byte streams over > NFS. It's been suggested to use unique integer values and assume that > programs know what the values mean, but given the potential for > variety I wonder if it should be more descriptive: > > { > name: "NFS-FOO-NONSTANDARD", > granularity: 0, > alignment: 0, > profile: "tag,checksum", > tag-width: u32, > checksum-alg: sha256, > checksum-width: u8[32], > } > or > { > name: "tag16-crc16-block32", > granularity: 512, > alignment: 512, > profile: "tag,checksum,reftag", > tag-width: u16, > checksum-alg: crc16, > checksum-width: u16, > reftag-alg: blocknum, > reftag-width: u32, > } > > Now, for the actual mechanics of modifying the kernel, here's my idea: > First, enhance the block_integrity API so that we can ask it about > supported data formats, algorithms, etc. (everything we need to supply > the schema described in the previous section). > > For buffered mode, each struct page would point to a buffer that is > big enough to hold all the PI data for all the blocks represented by > the page, as well as descriptors for the PI data. This gets much > harder for the case of arbitrary byte streams instead of disk sectors. > Perhaps we'd have to have a descriptor that looks like this: > > struct { > u16 start, end; > int flags; > void *buffer; > char[16] pi_profile; > }; > > In the case of byte stream PI, I'm not sure how the NFS protocols > would handle overlapping ranges -- send one page with the set of PIs > that cover that page? > > Anyway, when a buffered write comes in, we simply copy the user's > buffer into the thing hanging off struct page. When > bio_integrity_prep is called (during submit_bio), it will either find > no buffer and generate the PI data on its own like it does now, or > it'll find a buffer, attach it to the bio->bip, then ask the integrity > provider to fill in whatever's missing. A directio write would take > the PI data and attach it directly to the bio it submits. > > For buffered reads, bio_integrity_endio can allocate the buffer and > attach it to struct page, then fill in the fields that the disk > returned. The actual userland read function of course can then copy > the data out of the thing hanging off struct page into the user's > buffer, and then userland can do whatever it wants. A directio read > simply copies the data from the bio->bip into the userland buffer. > > As for the GET_ERROR thing, my first (and probably only) thought was > to find a way to attach a buffer and a description of what's in the > buffer to a bio, so that GET_ERROR can return the buffer contents. > A tricky part is to help out userspace by mapping an error code back > to the iocb. I need to think harder about this piece. Right now I'm > only thinking about disk storage; is anyone else interested enough in > returning rich error data to userland to help me bikeshed? :) > Big +1. I've been salivating over having a userspace interface for attaching protection information, and think this is a great topic for LSF. --nab -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html