FUJITA Tomonori wrote:
What's the general opinion on this? Duplicate code vs. more kernel code?
I can see that you're already starting to clean up the code that you
ported. Does that mean the duplicate code isn't an issue to you? When we
fix bugs in the initiator they're not going to make it into your tree
unless you're diligent about watching the list.
It's hard to convince the kernel maintainers to merge something into
mainline that which can be implemented in user space. I failed twice
(with two iSCSI target implementations).
Tomonori and "the kernel maintainers",
In fact, almost all of the kernel can be done in user space, including
all the drivers, networking, I/O management with block/SCSI initiator
subsystem and disk cache manager. But does it mean that currently kernel
is bad and all the above should be (re)done in user space instead? I
think, not. Linux isn't a microkernel for very pragmatic reasons:
simplicity and performance.
1. Simplicity.
For SCSI target, especially with hardware target card, data are come
from kernel and eventually served by kernel doing actual I/O or
getting/putting data from/to cache. Dividing the requests processing job
between user and kernel space creates unnecessary interface layer(s) and
effectively makes the requests processing job distributed with all its
complexity and reliability problems. As the example, what will currently
happen in STGT if the user space part suddenly dies? Will the kernel
part gracefully recover from it? How much effort will be needed to
implement that?
Another example is the mentioned above code duplication. Is it good?
What will it bring? Or you care only about amount of the kernel's code
and don't care about the overall amount of code? If so, you should
(re)read what Linus Torvalds thinks about that:
http://lkml.org/lkml/2007/4/24/364 (I don't consider myself as an
authoritative in this question)
I agree that some of the processing, which can be clearly separated, can
and should be done in user space. The good example of such approach is
connection negotiation and management in the way, how it's done in
open-iscsi. But I don't agree that this idea should be driven to the
absolute. It might look good, but it's unpractical, it will only make
things more complicated and harder for maintainership.
2. Performance.
Modern SCSI transports, e.g. Infiniband, have as low link latency as
1(!) microsecond. For comparison, the inter-thread context switch time
on a modern system is about the same, syscall time - about 0.1
microsecond. So, only ten empty syscalls or one context switch add the
same latency as the link. Even 1Gbps Ethernet has less, than 100
microseconds of round-trip latency.
You, most likely, know, that QLogic target driver for SCST allows
commands being executed either directly from soft IRQ, or from the
corresponding thread. There is a steady 5% difference in IOPS between
those modes on 512 bytes reads on nullio using 4Gbps link. So, a single
additional inter-kernel-thread context switch costs 5% of IOPS.
Another source of additional unavoidable with the user space approach
latency is data copy to/from cache. With the fully kernel space
approach, cache can be used directly, so no extra copy will be needed.
So, putting code in the user space you should accept the extra latency
it adds. Many, if not most, real-life workloads more or less latency,
not throughput, bound, so you shouldn't be surprised that single stream
"dd if=/dev/sdX of=/dev/null" on initiator gives too low values. Such
"benchmark" isn't less important and practical, than all the
multithreaded latency insensitive benchmarks, which people like running.
You may object me that the backstorage's latency is a lot more, than 1
microsecond, but that is true only if data are read/written from/to the
actual backstorage media, not from the cache, even from the backstorage
device's cache. Nothing prevents a target from having 8 or even 64GB of
cache, so most even random accesses could be served by it. This is
especially important for sync. writes.
Thus, I believe, that partial user space, partial kernel space approach
for building SCSI targets is the move in the wrong direction, because it
brings practically nothing, but costs a lot.
Vlad
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html