I believed there was some value in preallocating the readdir buffer, sort of the same way it is valuable to preallocate those slab buffers and then getting them and releaseing them with kmem_cache_alloc and kmem_cache_free. By default there's five readdir slots. Stuff has to wait on an available slot if all five slots are in use. I've made the waiting kick in a few times when running my "stupid human" test, and it seems to work right. If the ability to keep getting more and more buffers was unbounded I guess it could go run-away like some student's fork bomb... I'm pretty sure from talking to others that the waiting for a slot code hardly ever kicks in on the production cluster on campus. Martin said he saw commit messages in the old CVS repository that indicated that the original developers did indeed plan to make the readdir buffer a shared buffer. -Mike On Thu, Feb 11, 2016 at 7:55 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > On Thu, Feb 11, 2016 at 06:54:43PM -0500, Mike Marshall wrote: > >> The readdir buffer isn't a shared buffer like the IO buffer is. >> The readdir buffer is preallocated when the client-core starts up >> though. The kernel module picks which readdir buffer slot that >> the client-core fills, but gets back a copy of that buffer - the >> trailer. Unless the kernel module isn't managing the buffer slots >> properly, the client core shouldn't have more than one upcall >> on-hand that specifies any particular buffer slot. The "kill -9" >> on a ls (or whatever) might lead to such mis-management, >> but since readdir decoding is happening on a discrete copy >> of the buffer slot that was filled by the client-core, it doesn't >> seem to me like it could be overwritten during a decode... >> >> I believe there's nothing in userspace that guarantees that >> readdirs are replied to in the same order they are received... > > So... why the hell does readdir need to be aware of those slots? After > rereading this code I have to agree - it *does* copy that stuff, and no > sharing is happening at all. But the only thing that buf_index is used > for is "which buffer do we memcpy() to before passing it to writev()?" > in the daemon, and it looks like it would've done just as well using > plain and simple malloc() in > /* get a buffer for xfer of dirents */ > vfs_request->out_downcall.trailer_buf = > PINT_dev_get_mapped_buffer(BM_READDIR, s_io_desc, > vfs_request->in_upcall.req.readdir.buf_index); > in the copy_dirents_to_downcall()... > > Why is that even a part of protocol? Were you planning to do zero-copy > there at some point, but hadn't gotten around to that yet? If you do > (and decide to do it via shared buffers rather than e.g. splice), please > add cancel for readdir before going there... -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html