On Fri, 2015-01-09 at 19:28 +0100, Hannes Reinecke wrote: [...] > > I think you are assuming we are leaving the iscsi code as it is today. > > > > For the non-MCS mq session per CPU design, we would be allocating and > > binding the session and its resources to specific CPUs. They would only > > be accessed by the threads on that one CPU, so we get our > > serialization/synchronization from that. That is why we are saying we > > do not need something like atomic_t/spin_locks for the sequence number > > handling for this type of implementation. > > > Wouldn't that need to be coordinated with the networking layer? > Doesn't it do the same thing, matching TX/RX queues to CPUs? > If so, wouldn't we decrease bandwidth by restricting things to one CPU? So this is actually one of the fascinating questions on multi-queue. Long ago, when I worked for the NCR OS group and we were bringing up the first SMP systems, we actually found that the SCSI stack went faster when bound to a single CPU. The problem in those days was lock granularity and contention, so single CPU binding eliminated that overhead. However, nowadays with modern multi-tiered caching and huge latencies for cache line bouncing, we're approaching the point where the fineness of our lock granularity is hurting performance, so it's worth re-asking the question of whether just dumping all the lock latency by single CPU binding is a worthwhile exercise. James -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html