Re: [RFC PATCH 0/5] userspace PI passthrough via AIO/DIO

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Mar 21, 2014 at 03:54:37PM -0700, Darrick J. Wong wrote:
> On Fri, Mar 21, 2014 at 05:44:10PM -0400, Benjamin LaHaise wrote:
>
> > I'm inclined to agree with Zach on this item.  Ultimately, we need an 
> > extensible data structure that can be grown without completely revising 
> > the ABI as new parameters are added.  We need something that is either 
> > TLV based, or an extensible array.
> 
> Ok.  Let's define IOCB_FLAG_EXTENSIONS as an iocb.aio_flags flag to indicate
> that this struct iocb has extensions attached to it.  Then, iocb.aio_reserved2
> becomes a pointer to an array of extension descriptors, and iocb.aio_reqprio
> becomes a u16 that tells us the array length.  The libaio.h equivalents are
> iocb.u.c.flags, iocb.u.c.__pad3, and iocb.aio_reqprio, respectively.
> 
> Next, let's define a conceptual structure for aio extensions:
> 
> struct iocb_extension {
> 	void *ie_buf;
> 	unsigned int ie_buflen;
> 	unsigned int ie_type;
> 	unsigned int ie_flags;
> };
> 
> The actual definitions can be defined in a similar fashion to the other aio
> structures so that the structures are padded to the same layout regardless of
> bitness.  As mentioned above, iocb.aio_reserved2 points to an array of these.

I'm firmly in the camp that doesn't want to go down this abstract road.
We had this conversation with Kent when he wanted to do something very
similar.

What happens if there are duplicate ie_types?  Is that universally
prohibited, validity left up to the types that are duplicated?  What if
the len is not the right size?  Who checks that?  What if the extension
(they're arguments, but one thing at a time) is writable and the buf
pointers overlap or is unaligned?  Is that cool, who checks it?

Who defines the acceptable set?  Can drivers make up their own weird
types?  How does strace print all this?  How does the security module
universe declare policies that can forbid or allow these things?

Personally, I think this level of dynamism is not worth the complexity.

Can we instead just have a nice easy struct with fixed members that only
grows?

struct some_more_args {
	u64 has; /* = HAS_PI_VEC; */
	u64 pi_vec_ptr;
	u64 pi_vec_nr_segs;
};

struct some_more_args {
	u64 has; /* = HAS_PI_VEC | HAS_MAGIC_THING */
	u64 pi_vec_ptr;
	u64 pi_vec_nr_segs;
	u64 magic_thing;
};

If it only grows and has bits indicating presence then I think we're
good.   You only fetch the space for the bits that are indicated.  You
can return errors for bits you don't recognize.  You could perhaps offer
some way to announce the bits you recognize.

I'll admit, though, that I don't really like having to fetch the 'has'
bits first to find out how large the rest of the struct is.  Maybe
that's not worth worrying about.

Thoughts?  Am I out to lunch here?

> Question: Do we want to allow ie_buf to be struct iovec[]?  Can we leave that
> to the extension designer to decide if they want to support either a S-G list,
> one big (vaddr) buffer, or toggle flags?

No idea.  Either seems doable.  I'd aim for simpler to reduce the number
of weird cases to handle or forbid (iovecs with a byte per page!) unless
Martin thinks people want to vector the PI goo.

> I think so.  Let's see how much we can get done.

FWIW, I'm happy to chat about this in person at LSF next week.  I'll be
around.

- z
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux