> Sure, the kernel won't take that (the > op with the matching tag has been gone already), but the data is stored > into shared memory *before* writev() on the control device that would pass > the response to the kernel, so it still gets overwritten. Right under > decoding readdir()... The readdir buffer isn't a shared buffer like the IO buffer is. The readdir buffer is preallocated when the client-core starts up though. The kernel module picks which readdir buffer slot that the client-core fills, but gets back a copy of that buffer - the trailer. Unless the kernel module isn't managing the buffer slots properly, the client core shouldn't have more than one upcall on-hand that specifies any particular buffer slot. The "kill -9" on a ls (or whatever) might lead to such mis-management, but since readdir decoding is happening on a discrete copy of the buffer slot that was filled by the client-core, it doesn't seem to me like it could be overwritten during a decode... I believe there's nothing in userspace that guarantees that readdirs are replied to in the same order they are received... -Mike On Wed, Feb 10, 2016 at 11:44 AM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > On Tue, Feb 09, 2016 at 11:13:28PM +0000, Al Viro wrote: >> On Tue, Feb 09, 2016 at 10:40:50PM +0000, Al Viro wrote: >> >> > And the version in orangefs-2.9.3.tar.gz (your Frankenstein module?) is >> > vulnerable to the same race. 2.8.1 isn't - it ignores signals on the >> > cancel, but that means waiting for cancel to be processed (or timed out) >> > on any interrupted read() before we return to userland. We can return >> > to that behaviour, of course, but I suspect that offloading it to something >> > async (along with freeing the slot used by original operation) would be >> > better from QoI point of view. >> >> That breakage had been introduced between 2.8.5 and 2.8.6 (at some point >> during the spring of 2012). AFAICS, all versions starting with 2.8.6 are >> vulnerable... > > BTW, what about kill -9 delivered to readdir in progress? There's no > cancel for those (and AFAICS the daemon will reject cancel on anything > other than FILE_IO), so what's to stop another thread from picking the > same readdir slot and getting (daemon-side) two of them spewing into > the same area of shared memory? Is it simply that daemon-side the shared > memory on readdir is touched only upon request completion in completely > serialized process_vfs_requests()? That doesn't seem to be enough - > suppose the second readdir request completes (daemon-side) first, its results > get packed into shared memory slot and it is reported to kernel, which > proceeds to repack and copy that data to userland. In the meanwhile, > daemon completes the _earlier_ readdir and proceeds to pack its results into > the same slot of shared memory. Sure, the kernel won't take that (the > op with the matching tag has been gone already), but the data is stored > into shared memory *before* writev() on the control device that would pass > the response to the kernel, so it still gets overwritten. Right under > decoding readdir()... > > Or is there something in the daemon that would guarantee readdir responses > to happen in the same order in which it had picked the requests? I'm not > familiar enough with that beast (and overall control flow in there is, er, > not the most transparent one I've seen), so I might be missing something, > but I don't see anything obvious that would guarantee such ordering. > > Please, clarify. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html