Steve Wise wrote:
Hi Jeff,
Mike Christie will not merge this code until he has an explicit
acknowledgement from netdev.
As you mentioned, the port stealing approach we've taken has its issues.
We consequently analyzed your suggestion to use a different IP/MAC
address for iSCSI and it raises other tough issues (separate ARP and
DHCP management, unavailability of common networking tools).
On these grounds, we believe our current approach is the most tolerable.
Would the stack provide a TCP port allocation service, we'd be glad to
use it to solve the current concerns.
The cxgb3i driver is up and running here, its merge is pending our
decision.
Cheers,
Divy
Hey Dave/Jeff,
I think we need some guidance here on how to proceed. Is the approach
currently being reviewed ACKable? Or is it DOA? If its DOA, then what
approach do you recommend? I believe Jeff's opinion is a separate
ipaddr. But Dave, what do you think? Lets get some agreement on a high
level design here.
Possible solutions seen to date include:
1) reserving a socket to allocate the port. This has been NAK'd in the
past and I assume is still a no go.
2) creating a 4-tuple allocation service so the host stack, the rdma
stack, and the iscsi stack can share the same TCP 4-tuple space. This
also has been NAK'd in the past and I assume is still a no go.
3) the iscsi device allocates its own local ephemeral posts (port
stealing) and use the host's ip address for the iscsi offload device.
This is the current proposal and you can review the thread for the pros
and cons. IMO it is the least objectionable (and I think we really
should be doing #2).
4) the iscsi device will manage its own ip address thus ensuring 4-tuple
uniqueness.
Conceptually, it is a nasty business for the OS kernel to be forced to
co-manage an IP address in conjunction with a remote, independent entity.
Hardware designers make the mistake of assuming that firmware management
of a TCP port ("port stealing") successfully provides the illusion to
the OS that that port is simply inactive, and the OS happily continues
internetworking its merry way through life.
This is certainly not true, because of current netfilter and userland
application behavior, which often depends on being able to allocate
(bind) to random TCP ports. Allocating a TCP port successfully within
the OS, that then behaves different from all other TCP ports (because it
is the magic iSCSI port) creates a cascading functional disconnect. On
that magic iSCSI port, strange errors will be returned instead of proper
behavior. Which, in turn, cascades through new (and inevitably
under-utilized) error handling paths in the app.
So, of course, one must work around problems like this, which leads to
one of two broad choices:
1) implement co-management (sharing) of IP address/port space, between
the OS kernel and a remote entity.
2) come up with a solution in hardware that does not require the OS to
co-manage the data it has so far been managing exclusively in software.
It should be obvious that we prefer path #2.
For, trudging down path #1 means
* one must give the user the ability to manage shared IP addresses IN A
NON-HARDWARE-SPECIFIC manner. Currently most vendors of "TCP port
stealing" solutions seem to expect each user to learn a vendor-specific
method of identifying and managing the "magic port".
Excuse my language, but, what a fucking security and management
nightmare in a cross-vendor environment. It is already a pain, with
some [unnamed system/chipset vendors] management stealing TCP ports --
and admins only discover this fact when applications behave strangely on
new hardware.
But... its tough to notice because stumbling upon the magic TCP port
won't happen often unless the server is heavily loaded. Thus you have a
security/application problem once in a blue moon, due to this magic TCP
port mentioned in some obscure online documentation nobody has read.
* however, giving the user the ability to co-manage IP addresses means
hacking up the kernel TCP code and userland tools for this new concept,
something that I think DaveM would rightly be a bit reluctant to do?
You are essentially adding a bunch of special case code whenever TCP
ports are used:
if (port in list of "magic" TCP ports with special,
hardware-specific behavior)
...
else
do what we've been doing for decades
ISTR Roland(?) pointing out code that already does a bit of this in the
IB space... but the point is
Finally, this shared IP address/port co-management thing has several
problems listed on the TOE page: http://www.linuxfoundation.org/en/Net:TOE
such as,
* security updates for TCP problems mean that a single IP address can be
PARTIALLY SECURE, because security updates for kernel TCP stack and
h/w's firmware are inevitably updated separately (even if distributed
and compiled together). Yay, we are introducing a wonderful new
security problem here.
* from a security, network scanner and packet classifier point of view,
a single IP address no longer behaves like Linux. It behaves like
Linux... sometime. Depending on whether it is a magic TCP port or not.
Talk about security audit hell.
This should be plenty, so I'm stopping now. But looking down the TOE
wiki page I could easily come up with more reasons why "IP address
remote co-management" is more complicated and costly than you think.
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html