Pete Wyckoff wrote: > Lots of iser.h is original code from OSC, please put back copyright > headers on that file too. Thanks for keeping us on all the other > files. I'll do this, sure! > You should add a patch 4/3 to delete the old RDMA code, to avoid > confusion with having two in the tree. I did not send this iscsi_rdma.c removing patch, as it was just the entire file marked with "-". Because this was a RFC only, i allowed myself just to remove a line from the Makefile. This patch will be necessary of course, when we get to the real patches. > And also excise all the > hooks that old RDMA code put into what used to be generic iscsi > code. Transport abstraction changes too. I left them because i was not sure which direction the changes will take. When we finalize the design, these hooks will go away or change appropriately. > Since you have copied everything, there should be no need to test > for ->rdma. If you mean testing target->rdma, then it is a tricky bit. As of today there is no way of marking a target as "iser" or "pure iscsi" towards the rest of the world. This situation stems from the iWARP origins of iSER spec, which implied that the RDMA-capabilities are discovered during the login phase (and not by marking targets by a management means whatsoever). As you probably remember, the spec stipulated starting connections in TCP, performing login, and if the target agrees to support iser, then one can proceed to the full-featured phase, else one can fall back to TCP or disconnect. With IB this is not an option. We have to export all targets (both tcp and iser) thru SendTargets/iSNS just as iscsi targets. The initiator's only way to know whether it's iser or iscsi/tcp is to try to login over iser (a word from an oracle will also do). The code managing sessions is common, and needs to stay this way at least to a certain degree even if we separate transports. Thus when a new session is being instantiated, we have to ask if the target is RDMA or not, and if the login request came over a wrong transport, we should reject. That's where target->rdma checks come from. > I'm still unhappy how you duplicated all the text handling. Seems > like there should have been a way to abstract the buffer handling > more cleanly. We have already agreed that i'll start with submitting patches that aim at a common set of text processing functions. > iser_task_add_out_rdma_buf - also a good idea This reminds me of the need to support multiple buffers per command (scatter-gather) to be able to support properly all existing operational modes of iscsi in iser. This subject was raised in context of the new SG code. Tomo, perhaps we should start looking at the implications of this? > sched_rdma_rd, iosubmit, etc. I like how you broke up all the > parts involved in handling requests; we only had maybe half as > many phases. One benefit of breaking the processing into phases is queuing all the tasks waiting for a certain event and processing them all in batches. This allows issuing one RDMA request consisting of a few linked operations in the same direction, like a few RDMA-Writes and Sends (from target), or few RDMA-Reads (to target). This requests aggregation saves user-kernel switches and interrupts. This also means that we can tolerate a short accumulation of ready tasks on such a queue, if the ongoing activity is likely to produce more tasks for this queue, so that a greater degree of request aggregation is achieved. > But this schedule() interface worries me. It > looks like it only gets called when the fds are quiet. That > seems like a bad way to manage load. As soon as you accept a > request, you want to push it all the way through. We ran into > this problem with iscsi_rdma. If the fds are not quiet then tgt is probably busy with processing i/o (mainly) completions, which will queue more tasks ready for completion (RDMA-W and Send), so that all these completions will be aggregated. We could also limit the number of fd-related events effectively processed during one epoll_wait() sweep so that a better sharing between fd/schedule events is achieved. > Too bad you broke AHS and bidir. That worked in iscsi_rdma. > I worry you'll never get around to fixing it. AHS was broken mainly because all the troubles with the text functions. Bidir is another story, but the code mainly lacks plugging bidir commands into the buffer allocation sequence. I believe that we'll be able to get around these :) > Performance graphs would be useful to allay my concerns about your > use of schedule(), and make sure the fine-grained approach is valid. Well, i'll produce some, when we get to the real patches. > Interesting how you changed MAX_POLL_WC to 32. That wants to be > tunable more dynamically. I agree. Need to add a parameter to --iser commands set > Testing? > - long client runs, see if memory leaks and/or performance drops done, quite a lot > - lots of short-lived clients, ditto done, need to test more > - lots of clients that die at random areas of communication > (using netfilter to block particular client IPs makes this easy) done, by periodically switching an IB port off and back on - this method tests scenarios when *all* clients disconnect and reconnect at once, so in a sense it is "more" stress, but perhaps misses some other subtle scenarios. > I'm not going to be working on tgt, not because of your changes, > just because my interest lies elsewhere these days. Will happily > look at graphs or patches if that can help, though. Thanx for the insightful comments! Hope you'll be able to continue reviewing in the future. -- To unsubscribe from this list: send the line "unsubscribe stgt" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html