orangefs: NAK until the ABI is documented (was Re: updated orangefs tree at kernel.org)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Oct 10, 2015 at 03:31:57AM +0100, Al Viro wrote:
> 	* atrocious userland API for downcalls (response to everything
> other than readdir must come in 4-element iovec array passed to writev(),
> no matter where the segment boundaries are; response to readdir must come
> in 5-element iovec array passed to writev(), the boundaries among the
> first 4 segments do not matter, the 5th segment covering exactly the payload).
> IMO that needs to be fixed before we merge the damn thing - it's really too
> ugly to live.  I would really like to hear from somebody familiar with the
> userland side - what responses does it actually submit?  The validation
> kernel-side is basically inexistent, and while I suspect that we could handle
> the actually sent responses much cleaner, the full set of everything that
> would be accepted by the current code is a bloody mess and would be much
> harder to handle in a clean way.  What's more, the response layouts are
> messy, and if it is still possible to change that API, we'd be much better off
> if we cleaned them up.

Another API bogosity: when the 5th segment is present, successful writev()
returns the sum of sizes of the first 4.  Userland side just ignores that -
everything positive is treated as "everything's written".

Another piece of fun: header too large by less than sizeof(long) => print
a warning, leave op->downcall completely uninitialized and proceed.

Below is what the kernel side is actually doing:
	* less than 4 segments or more than 5 => whine and fail
	* if concatenation of the first 4 segments is longer than
16 + sizeof(struct pvfs2_downcall_s) + sizeof(long) => whine and fail
	* if concatenation of the first 4 segments is longer than
16 + sizeof(struct pvfs2_downcall_s) by no more than sizeof(long) => whine
and proceed with garbage.
	* if concatenation of the first 4 segments is no longer than
16 + sizeof(struct pvfs2_downcall_s), it is zero-padded to that size,
then the first 16 bytes are interpreted as 32bit protocol version (ignored)
+ 32bit magic (checked for one specific value) + 64bit tag (used to find the
matching request).  The rest is copied into op->downcall.
	* if the 32bit value 4 bytes into op->downcall is zero and 64bit
value following it is non-zero, the latter is interpreted as the size of
trailer data.
	* if there's no trailer, the 5th segment (if present) is completely
ignored.  Otherwise it must be present and is expected to contain the trailer
data.
	* if the 5th segment is longer than expected trailer data => whine
and fail (note that if the amount of expected trailer data is zero a non-empty
5th segment is quietly ignored).
	* if trailer expected, vmalloc a buffer for it and copy the 5th
segment contents in there; in case of the 5th segment being *shorter* than
expected trailer data, leave the rest of the buffer uninitialized (i.e.
expose the contents of random previously freed pages).
	* if vmalloc fails, act as if status (32bit at offset 5 into
op->downcall) had been -ENOMEM and don't look at the 5th segment at all.
	* in all cases ignore the amount of data taken from the 5th segment;
the return value is the sum of sizes of the first 4.

	What the current userland side appears to sends is 4-segment
version/magic/tag/struct pvfs2_downcall_s + possibly the 5th segment covering
the readdir results.  Trailer size in struct pvfs2_downcall_s actually is
equal to the size of the 5th segment if present and 0 otherwise.
	The 4th segment (struct pvfs2_downcall_s) contains a bunch of garbage
never used by the kernel (pointers and misalignment gaps).
	How much of that is guaranteed is, of course, known only to the
orangefs folks.  I only looked at the current version of the userland code;
hadn't checked the older ones and have no idea what kind of changes might
be planned.

	Folks, userland interfaces need to be documented (and preferably -
contain less ugliness).  At that point I think that we need the damn API
written up and reviewed, or it's a strong NAK.  Once it's in the tree, it's
cast in stone, so we'd better get it right and we need it explicitly
documented.  "Whatever orangefs userland code does" doesn't cut it, especially
since your protocol version check is inexistent.  Digging through the entire
history of your userland code and trying to deduce what would and what would
not be OK to change kernel-side is far past reasonable.

	I'm not asking for the description of semantics for individual requests
and responses (it would be nice to have as well, but that's a separate story),
but the rules for data marshalling are really needed.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux