On Mon, Oct 12, 2015 at 12:20:42PM -0400, Mike Marshall wrote: > > ... the ABI is ... utterly... "idiosyncratic"... > > Al, do you mean idiosyncratic in a "distinctive and special" way, > or a "strange and bizarre" way? <g> The latter, but much stronger ;-) > When I first started working with the kernel module, I grew > to think of the service_operation as the event horizon to > a black hole, but I figured that was just me... *snort* Anything you get back is, by definition, from before the event horizon... > First off, Al, I want to thank you for spending so much > time on the review. You have pointed out numerous ways > to make the kernel module better. Most importantly, though, > I believe you are saying (and I'm purposely restating > your words here to make sure we're on the same page) that there > will be no ack until we provide a usable detailed definition of > what is in every upcall and downcall and ioctl between > the kernel module and the userspace daemon. Not only do we need, > for example, to specify how LOOKUP_FOLLOW needs to be provided > in a lookup request, but it would be nice to also say why. LOOKUP_FOLLOW is a separate story - the issue here is that you are exposing internal details of the pathname resolution to the userland and that's essentially a demand to keep them stable from now on; however, I'm seriously at loss as to _what_ to keep stable. "This call of ->lookup() gets LOOKUP_FOLLOW in flags" doesn't map to any natural semantics I can think of; it might accidentally happen to match something useful, but I don't see what it is. Your userland code does something rather odd with it, but it's buried inside your state machine and I don't understand it well enough to tell what's going on. Is it trying to follow symlinks on its own upon lookup? If so, it's _very_ bogus. If nothing else, userland code doesn't have enough information to do that - it has no idea what the originating process is chrooted to, for starters. As the result, having your filesystem mounted inside a chroot jail would open a nice way for non-priveleged process to break out - just create an absolute symlink somewhere on that filesystem and go through it. Again, I'm not sure what that userland code is doing, so it (hopefully) might be something completely different. So yes, I really do want details on this one, but for a different reason. What I want from API documentation is something along the lines of "writev() on /dev/pvfs2-req is used to pass responses to the requests made by the kernel side; the data structure being passed is represented in the following way <...>; representations do not statisfying constraints <...> are to be rejected". What you do with the payload _after_ it's been transferred is a separate story; it's also nice to have, but right now we don't even have the calling conventions, nevermind the description of behaviour. How is one supposed to review the marshalling code? Sure, I can tell that if you have the iov[4].iov_len greater than .trailer_size, you end up with uninitialized tail of vmalloc'ed buffer. I can also see that your userland code doesn't ever pass a positive .trailer_size with iov[4].iov_len different from it. What's the proper fix? To declare that they must always be equal, and reject mismatches, as you already do for inequality in other direction? Or should we quietly zero-pad the buffer? Either will work for your current userland code, but only you can tell which is the right fix - I don't know what the older versions used to do and I don't know what changes are planned. And yes, I understand the desire to separate a potentially large payload kernel-side; is there any deep reason why one doesn't simply use .trailer_size to decide how much in the end of the area being transferred should go there? Are there any plans for having the first 4 segments shorter than the full 16 + sizeof(struct pvfs2_downcall_s)? If so, what's the minimal amount to be expected in all cases? I'm not nitpicking here - having e.g. short packets ending at (non-zero) .status would work with the current kernel code and isn't inconceivable as a planned change. For that matter, what's .type for? It looks like a selector for that union you have in the end, but it is not used that way - originating upcall is used for that instead. Is it just a junk, or is it a currently unimplemented sanity check (if .type in downcall must be the same as in matching upcall)? > You're not promising an ack in return, but we can't go further > until we provide the definition. *shrug* Sure, I can try and pull it out of you guys bit by bit instead (see above for some initial questions), but I suspect that it would be faster and less painful for everyone involved if you just describe the calling conventions and we'll see what might be missing. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html