Re: Open-FCoE on linux-scsi

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



FUJITA Tomonori wrote:
What's the general opinion on this? Duplicate code vs. more kernel code?
I can see that you're already starting to clean up the code that you
ported. Does that mean the duplicate code isn't an issue to you? When we
fix bugs in the initiator they're not going to make it into your tree
unless you're diligent about watching the list.

It's hard to convince the kernel maintainers to merge something into
mainline that which can be implemented in user space. I failed twice
(with two iSCSI target implementations).

Tomonori and "the kernel maintainers",

In fact, almost all of the kernel can be done in user space, including all the drivers, networking, I/O management with block/SCSI initiator subsystem and disk cache manager. But does it mean that currently kernel is bad and all the above should be (re)done in user space instead? I think, not. Linux isn't a microkernel for very pragmatic reasons: simplicity and performance.

1. Simplicity.

For SCSI target, especially with hardware target card, data are come from kernel and eventually served by kernel doing actual I/O or getting/putting data from/to cache. Dividing the requests processing job between user and kernel space creates unnecessary interface layer(s) and effectively makes the requests processing job distributed with all its complexity and reliability problems. As the example, what will currently happen in STGT if the user space part suddenly dies? Will the kernel part gracefully recover from it? How much effort will be needed to implement that?

Another example is the mentioned above code duplication. Is it good? What will it bring? Or you care only about amount of the kernel's code and don't care about the overall amount of code? If so, you should (re)read what Linus Torvalds thinks about that: http://lkml.org/lkml/2007/4/24/364 (I don't consider myself as an authoritative in this question)

I agree that some of the processing, which can be clearly separated, can and should be done in user space. The good example of such approach is connection negotiation and management in the way, how it's done in open-iscsi. But I don't agree that this idea should be driven to the absolute. It might look good, but it's unpractical, it will only make things more complicated and harder for maintainership.

2. Performance.

Modern SCSI transports, e.g. Infiniband, have as low link latency as 1(!) microsecond. For comparison, the inter-thread context switch time on a modern system is about the same, syscall time - about 0.1 microsecond. So, only ten empty syscalls or one context switch add the same latency as the link. Even 1Gbps Ethernet has less, than 100 microseconds of round-trip latency.

You, most likely, know, that QLogic target driver for SCST allows commands being executed either directly from soft IRQ, or from the corresponding thread. There is a steady 5% difference in IOPS between those modes on 512 bytes reads on nullio using 4Gbps link. So, a single additional inter-kernel-thread context switch costs 5% of IOPS.

Another source of additional unavoidable with the user space approach latency is data copy to/from cache. With the fully kernel space approach, cache can be used directly, so no extra copy will be needed.

So, putting code in the user space you should accept the extra latency it adds. Many, if not most, real-life workloads more or less latency, not throughput, bound, so you shouldn't be surprised that single stream "dd if=/dev/sdX of=/dev/null" on initiator gives too low values. Such "benchmark" isn't less important and practical, than all the multithreaded latency insensitive benchmarks, which people like running.

You may object me that the backstorage's latency is a lot more, than 1 microsecond, but that is true only if data are read/written from/to the actual backstorage media, not from the cache, even from the backstorage device's cache. Nothing prevents a target from having 8 or even 64GB of cache, so most even random accesses could be served by it. This is especially important for sync. writes.

Thus, I believe, that partial user space, partial kernel space approach for building SCSI targets is the move in the wrong direction, because it brings practically nothing, but costs a lot.

Vlad
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux