On Tue, Feb 11, 2014 at 12:31:39AM +0000, Dilger, Andreas wrote: > The off-stack storage of per-thread info is used to avoid stack overflow > because of the 4kB stack limitation, and to avoid kmalloc/kfree of the > same data structures for every call into the callpath. Other code in > the kernel handles similar problems by having a separate thread do the > lower level IO, but that adds context switches to the IO path. I'm not > a big fan of this, but that is what we live with due to limited stack > size. <wry> Reducing the depth of call chains also might help. </wry> And yes, I understand the motivation; makes keeping track of what's going on really unpleasant, though... > >4) cl_io_init() which calls > > > >5) cl_io_init0() which possibly calls a bunch of instances of > >->coo_io_init(), > >which may or may not return 0; I _hope_ that in this case that's what'll > >happen and we get back to ll_file_io_generic(), where Umm... So am I right about the thing returning 0 in this case and in case of ->aio_read()? > >5') cl_io_start() which a calls a bunch (how large in that case?) of > > > >6') ->cio_start() instances. Hopefully that'll be vvp_io_write_start() Again, is that assumption correct? I'm not interested in criticizing overall design, etc.; what I am interested in is what does that code end up doing wrt carrying iov/iovlen/iocb/position through the entire thing. > >which will pass the value of io->u.ci_rw.crw_pos (picked via an > >overlapping field of union) to generic_file_aio_write(). Which, BTW, > > updates ->ki_pos. Then we return into cl_io_loop(), where > > > >4'') we call cl_io_end(), cl_io_unlock() and > > > >5'') cl_io_rw_advance() which increments ....crw_pos, hopefully in sync > >with what we have in iocb->ki_pos. And calls a bunch of methods. And > >return into cl_io_loop(), where > >3''') we copy ....crw_pos into iocb->ki_pos. WTF do we need that? Hadn't > >generic_file_aio_write() been good enough? Is it safe to assume that * these two calls of generic_file_aio_{read,write} will always get pos equal to iocb->ki_pos * iocb/iov/iovlen will be unchanged since the moment they'd been passed to ll_file_aio_{read,write}() * ->ki_pos assigment after the call of cl_io_loop() is no-op for IO_NORMAL case * these calls of generic_file_aio_{read,write} will always have ll_file_aio_{read,write} in the call chain and can't be reached otherwise? What I want to do is turn ->aio_read()/->aio_write() signature into kiocb x iov_iter -> ssize_t. With all instances, including generic_file_aio_{read,write}() converted to that. With pos delivered via iocb and iov/nrsegs via iov_iter. Supplying that in places where they are called via method is trivial and for such calls we definitely have pos == iocb->ki_pos. Most of the places calling those instances directly (not via method) are in other instances of the same methods and pass the arguments unchanged. These two in lustre are, in fact, the only real exceptions. IF the hypotheses above are correct, we can convert those two - just store iov_iter reference instead of iov/nrsegs in vvp_io_args and in ccc_io and that'll do the trick. Probably put iov_iter alongside the vti_local_iovec as well, but I really wonder if that's worth the trouble - might be better to use do_sync_read/do_sync_write and be done with that. Sure, it will slightly increase the stack footprint, but if you are _that_ tight on stack... Anyway, that's a separate story - we can deal with that later. PS: I'm serious about the call graph, ideally along with the stack footprints of all functions. At least it would give some idea how much do various paths contribute... Of course, it's a real bitch to automate with the amount of indirect calls you've got there - the tricky part is finding the set of functions that might be called at given location ;-/ -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html