On Fri, Mar 21, 2014 at 05:29:09PM -0700, Zach Brown wrote: > On Fri, Mar 21, 2014 at 03:54:37PM -0700, Darrick J. Wong wrote: > > On Fri, Mar 21, 2014 at 05:44:10PM -0400, Benjamin LaHaise wrote: > > > > > I'm inclined to agree with Zach on this item. Ultimately, we need an > > > extensible data structure that can be grown without completely revising > > > the ABI as new parameters are added. We need something that is either > > > TLV based, or an extensible array. > > > > Ok. Let's define IOCB_FLAG_EXTENSIONS as an iocb.aio_flags flag to indicate > > that this struct iocb has extensions attached to it. Then, iocb.aio_reserved2 > > becomes a pointer to an array of extension descriptors, and iocb.aio_reqprio > > becomes a u16 that tells us the array length. The libaio.h equivalents are > > iocb.u.c.flags, iocb.u.c.__pad3, and iocb.aio_reqprio, respectively. > > > > Next, let's define a conceptual structure for aio extensions: > > > > struct iocb_extension { > > void *ie_buf; > > unsigned int ie_buflen; > > unsigned int ie_type; > > unsigned int ie_flags; > > }; > > > > The actual definitions can be defined in a similar fashion to the other aio > > structures so that the structures are padded to the same layout regardless of > > bitness. As mentioned above, iocb.aio_reserved2 points to an array of these. > > I'm firmly in the camp that doesn't want to go down this abstract road. > We had this conversation with Kent when he wanted to do something very > similar. Could you point me to this discussion? I'd like to read it. > What happens if there are duplicate ie_types? Is that universally > prohibited, validity left up to the types that are duplicated? Yes. > What if the len is not the right size? Who checks that? The extension driver, presumably. > What if the extension (they're arguments, but one thing at a time) is > writable and the buf pointers overlap or is unaligned? Is that cool, who > checks it? Each extension driver has to check the alignment. I don't know what to do about buffer pointer overlap; if you want to shoot yourself in the foot that's fine with me. > Who defines the acceptable set? > Can drivers make up their own weird types? How do you mean? As far as whatever's in the ie_buf, I think that depends on the extension. > How does strace print all this? How does the security module universe > declare policies that can forbid or allow these things? I don't know. > Personally, I think this level of dynamism is not worth the complexity. > > Can we instead just have a nice easy struct with fixed members that only > grows? > > struct some_more_args { > u64 has; /* = HAS_PI_VEC; */ > u64 pi_vec_ptr; > u64 pi_vec_nr_segs; > }; > > struct some_more_args { > u64 has; /* = HAS_PI_VEC | HAS_MAGIC_THING */ > u64 pi_vec_ptr; > u64 pi_vec_nr_segs; > u64 magic_thing; > }; > > If it only grows and has bits indicating presence then I think we're > good. You only fetch the space for the bits that are indicated. You > can return errors for bits you don't recognize. You could perhaps offer > some way to announce the bits you recognize. <shrug> I was gonna just -EINVAL for types we don't recognize, or which don't apply in this scenario. > I'll admit, though, that I don't really like having to fetch the 'has' > bits first to find out how large the rest of the struct is. Maybe > that's not worth worrying about. I'm not worrying about having to pluck 'has' out of the structure, but needing a function to tell me how big of a buffer I need for a given pile of flags seems ... icky. But maybe the ease of modifying strace and security auditors would make it worth it? > Thoughts? Am I out to lunch here? I don't have a problem adopting your design, aside from the complications of figuring out how big struct some_more_args really is. > > Question: Do we want to allow ie_buf to be struct iovec[]? Can we leave that > > to the extension designer to decide if they want to support either a S-G list, > > one big (vaddr) buffer, or toggle flags? > > No idea. Either seems doable. I'd aim for simpler to reduce the number > of weird cases to handle or forbid (iovecs with a byte per page!) unless > Martin thinks people want to vector the PI goo. For now I'll leave it as a simple buffer until I hear otherwise. > > I think so. Let's see how much we can get done. > > FWIW, I'm happy to chat about this in person at LSF next week. I'll be > around. Me too! --D > > - z > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>