Mike Christie wrote:
Vladislav Bolkhovitin wrote:
Mike Christie wrote:
Are you sure that there are no now or will be available in the
nearest feature such (eg iSCSI) SCSI arrays with response
time/latency so small that having 5 (five) context switches or more
per command, some of which include map/unmap operations, will not
increase the latency too much? I mean, eg NFS server, which
originally was user space daemon and many people didn't want it in
the kernel. Eventually, it's in. I don't see any fundamental
difference between NFS server and SCSI target server,
Isn't the reason a NFS server is still in the kernel is becuase some
of the locking difficulties?
Might be. But from what I remember, the major reason was the
performance. After googling a bit I found many acknowledgments of that.
I do not think we are going to get anywhere with this type of thread :(
We should try to compare at least one of the userspace *nbd
implementations with the unh target in scst. I see some that just do
some basic socket ops (no sendfile type hook in even) for the network
part then just async or normal read/writes. I do not want to comapre FC
to nbd, but maybe comparing software iscsi to userspace nbd is a little
more fair. I think ata over ethernet has a userspace target too. Is the
unh target defaults set ok for performance testing, or could you send
some off list, so we can at least test those.
Agree that we need to have some numbers. But currently it is impossible
to measure them correctly without very considerable effort. For
instance, the comparision of nbd with iscsi includes in the measurements
not only user space/kernel space differences, but also many additional
parts, like different implementation architectures. For the correct
comparison we need some target driver (for scst or sgt), which commands
would be processed in both user and kernel space. Additionally, because
we discuss not only user vs kernel implementations, but also SIRQ vs
thread implementations, the target needs to be the hardware one.
Right now without big effort we can only compare SIRQ vs thread
implementations over FC, because the QLA target driver and scst support
both modes of SCSI commands execution. See DEBUG_WORK_IN_THREAD symbol.
We did some comparisons some time ago and, if I recall correctly, on
small blocks (especially 16K and smaller) the performance drop was quite
visible, because ~40000+ cs/sec are not very good for the system health
:). You can easily repeat those experiments using scst, the qlogic
driver and disk_perf or tape_perf dev handler.
But, since FC has quite a big latencies, this comparision will not fully
suit our needs. We need some some low latency link. Probably, some of
hardware iSCSI cards, like Qlogic 4100. But this is not the nearest future.
Vlad
-
: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html