Re: [RFC PATCH 0/5] userspace PI passthrough via AIO/DIO

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Mar 21, 2014 at 05:29:09PM -0700, Zach Brown wrote:
> On Fri, Mar 21, 2014 at 03:54:37PM -0700, Darrick J. Wong wrote:
> > On Fri, Mar 21, 2014 at 05:44:10PM -0400, Benjamin LaHaise wrote:
> >
> > > I'm inclined to agree with Zach on this item.  Ultimately, we need an 
> > > extensible data structure that can be grown without completely revising 
> > > the ABI as new parameters are added.  We need something that is either 
> > > TLV based, or an extensible array.
> > 
> > Ok.  Let's define IOCB_FLAG_EXTENSIONS as an iocb.aio_flags flag to indicate
> > that this struct iocb has extensions attached to it.  Then, iocb.aio_reserved2
> > becomes a pointer to an array of extension descriptors, and iocb.aio_reqprio
> > becomes a u16 that tells us the array length.  The libaio.h equivalents are
> > iocb.u.c.flags, iocb.u.c.__pad3, and iocb.aio_reqprio, respectively.
> > 
> > Next, let's define a conceptual structure for aio extensions:
> > 
> > struct iocb_extension {
> > 	void *ie_buf;
> > 	unsigned int ie_buflen;
> > 	unsigned int ie_type;
> > 	unsigned int ie_flags;
> > };
> > 
> > The actual definitions can be defined in a similar fashion to the other aio
> > structures so that the structures are padded to the same layout regardless of
> > bitness.  As mentioned above, iocb.aio_reserved2 points to an array of these.
> 
> I'm firmly in the camp that doesn't want to go down this abstract road.
> We had this conversation with Kent when he wanted to do something very
> similar.

Could you point me to this discussion?  I'd like to read it.

> What happens if there are duplicate ie_types?  Is that universally
> prohibited, validity left up to the types that are duplicated?

Yes.

> What if the len is not the right size?  Who checks that?

The extension driver, presumably.

>  What if the extension (they're arguments, but one thing at a time) is
>  writable and the buf pointers overlap or is unaligned?  Is that cool, who
>  checks it?

Each extension driver has to check the alignment.  I don't know what to do
about buffer pointer overlap; if you want to shoot yourself in the foot that's
fine with me.

> Who defines the acceptable set?


>  Can drivers make up their own weird types?

How do you mean?  As far as whatever's in the ie_buf, I think that depends on
the extension.

>  How does strace print all this?  How does the security module universe
>  declare policies that can forbid or allow these things?

I don't know.

> Personally, I think this level of dynamism is not worth the complexity.
> 
> Can we instead just have a nice easy struct with fixed members that only
> grows?
> 
> struct some_more_args {
> 	u64 has; /* = HAS_PI_VEC; */
> 	u64 pi_vec_ptr;
> 	u64 pi_vec_nr_segs;
> };
> 
> struct some_more_args {
> 	u64 has; /* = HAS_PI_VEC | HAS_MAGIC_THING */
> 	u64 pi_vec_ptr;
> 	u64 pi_vec_nr_segs;
> 	u64 magic_thing;
> };
> 
> If it only grows and has bits indicating presence then I think we're
> good.   You only fetch the space for the bits that are indicated.  You
> can return errors for bits you don't recognize.  You could perhaps offer
> some way to announce the bits you recognize.

<shrug> I was gonna just -EINVAL for types we don't recognize, or which don't
apply in this scenario.

> I'll admit, though, that I don't really like having to fetch the 'has'
> bits first to find out how large the rest of the struct is.  Maybe
> that's not worth worrying about.

I'm not worrying about having to pluck 'has' out of the structure, but needing
a function to tell me how big of a buffer I need for a given pile of flags
seems ... icky.  But maybe the ease of modifying strace and security auditors
would make it worth it?

> Thoughts?  Am I out to lunch here?

I don't have a problem adopting your design, aside from the complications of
figuring out how big struct some_more_args really is.

> > Question: Do we want to allow ie_buf to be struct iovec[]?  Can we leave that
> > to the extension designer to decide if they want to support either a S-G list,
> > one big (vaddr) buffer, or toggle flags?
> 
> No idea.  Either seems doable.  I'd aim for simpler to reduce the number
> of weird cases to handle or forbid (iovecs with a byte per page!) unless
> Martin thinks people want to vector the PI goo.

For now I'll leave it as a simple buffer until I hear otherwise.

> > I think so.  Let's see how much we can get done.
> 
> FWIW, I'm happy to chat about this in person at LSF next week.  I'll be
> around.

Me too!

--D
> 
> - z
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux