Re: Orangefs ABI documentation

Mike Marshall <hubcap@xxxxxxxxxxxx> · Fri, 12 Feb 2016 07:13:41 -0500

 I believed there was some value in preallocating
the readdir buffer, sort of the same way it is
valuable to preallocate those slab buffers
and then getting them and releaseing them
with kmem_cache_alloc and kmem_cache_free.

By default there's five readdir slots. Stuff has to
wait on an available slot if all five slots are in
use. I've made the waiting kick in a few times
when running my "stupid human" test, and it
seems to work right. If the ability to keep
getting more and more buffers was unbounded
I guess it could go run-away like some student's
fork bomb... I'm pretty sure from talking to others
that the waiting for a slot code hardly ever
kicks in on the production cluster on campus.

Martin said he saw commit messages in the
old CVS repository that indicated that the
original developers did indeed plan to make
the readdir buffer a shared buffer.

-Mike

On Thu, Feb 11, 2016 at 7:55 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> On Thu, Feb 11, 2016 at 06:54:43PM -0500, Mike Marshall wrote:
>
>> The readdir buffer isn't a shared buffer like the IO buffer is.
>> The readdir buffer is preallocated when the client-core starts up
>> though. The kernel module picks which readdir buffer slot that
>> the client-core fills, but gets back a copy of that buffer - the
>> trailer. Unless the kernel module isn't managing the buffer slots
>> properly, the client core shouldn't have more than one upcall
>> on-hand that specifies any particular buffer slot. The "kill -9"
>> on a ls (or whatever) might lead to such mis-management,
>> but since readdir decoding is happening on a discrete copy
>> of the buffer slot that was filled by the client-core, it doesn't
>> seem to me like it could be overwritten during a decode...
>>
>> I believe there's nothing in userspace that guarantees that
>> readdirs are replied to in the same order they are received...
>
> So... why the hell does readdir need to be aware of those slots?  After
> rereading this code I have to agree - it *does* copy that stuff, and no
> sharing is happening at all.  But the only thing that buf_index is used
> for is "which buffer do we memcpy() to before passing it to writev()?"
> in the daemon, and it looks like it would've done just as well using
> plain and simple malloc() in
>     /* get a buffer for xfer of dirents */
>     vfs_request->out_downcall.trailer_buf =
>         PINT_dev_get_mapped_buffer(BM_READDIR, s_io_desc,
>             vfs_request->in_upcall.req.readdir.buf_index);
> in the copy_dirents_to_downcall()...
>
> Why is that even a part of protocol?  Were you planning to do zero-copy
> there at some point, but hadn't gotten around to that yet?  If you do
> (and decide to do it via shared buffers rather than e.g. splice), please
> add cancel for readdir before going there...
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html