Re: wip-addr

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Aug 24, 2015 at 11:01:45AM -0700, Sage Weil wrote:
...
> I looked over the wip-addr branch a bit.  I have two basic 
> questions/concerns:
> 
> 1) In commit
> 	https://github.com/ceph/ceph/commit/73b09090466a43d5aceb979a4802de3f3f5bf24a 
> we switch the type field from sa_family to transport_type.  This seems 
> like the way to go, but we need to deal with the fact that lots of 
> clusters out there are IPv6 and have AF_INET6 filled in here.  Probably 
> both types should be interpreted to mean "existing/legacy IP messenger" or 
> whatever we want to call SimpleMessenger/AsyncMessenger's protocol.
> 
> I think encode needs to make sure it fills in that value for the type when 
> encoding the legacy entity_addr_t encoding, but could use a/the single 
> valid value for the new encoding.  And any get_transport_type() accessor 
> should also return the single valid value.

With cohortfs I had the luxury of ignoring existing installations.

I think I concluded at the time that probably most real systems
had '0' stored here.
Yes, if there are things that have "junk" here, they'll need to be handled
appropriately - I think it can be mostly fixed by just having decode map
transport_type "AF_INET6" (which is probably 10) to 0.  AF_INET is 2,
so maybe that too.  People switching *back* to older code after using
new code won't be so happy - would probably need a patch for older code
if that's a concern.

If there's code out there that thinks get_transport_type() returns an address
family, that will need fixing.  something like,
	get_address_family(){ return in_addr->ss_family; }
but I didn't find any code that actually needed this.

> 
> 2) In the later commit
> 	https://github.com/ceph/ceph/commit/9d203a2058f76414703b4fc212a1a0a960d0c672 
> you introduce a grammar for printing/parsing the addrs.  This also makes 
> sense since e.g. xio uses an IP to identify an endpoint.  I think we 
> should identify these based on the *protocol* and not the implementation, 
> though... whether we use SimpleMessenger or AsyncMessenger is not a 
> property of the address.  Maybe "tcp://" makes more sense here?  Or 
> perhaps no prefix at all (a bare IP address), so that this looks the same 
> as it did before in the case where the default protocol(s) are in use.
> 
> I assume the xio protocol (whether it is rdma or tcp) is closely tied to 
> libxio itself.. is that right?  If so, using xio in that prefix makes 
> sense.  I'd include xio somewhere in the rdma prefix though (xrdma:// and 
> xtcp://)?

Code base I had didn't have async messenger as a transport type; if that
needs to be addressable independently of simple messenger it will need a
prefix.  I had a prefixless address decode as simple messenger - I
can make this more obvious than "unsigned type = 0;".

I had these prefixes,
	sm
	rdma
	xtcp
but I can easily pick others.  I figured rdma didn't need an x since
there didn't seem to be much chance anybody would want a non-xio version
of rdma.

> 
> What do you think?
> 
> Logistically, I think the steps for getting this ready for merge are:
> 
> 1) Separate out the preliminary patches that pass a feature to the addr 
> encoding.. without any of the other cohortfs patches that are currently on 
> this branch.  Once this builds we can merge it separately from the rest...
> 
> 2) The entity_addrvec_t type.
> 
> 3) The type -> transport_type switch.

There's a lot of cohortfs glop that still needs to go away in this branch.

I hadn't thought of structuring the patch that way.  I can do it,
but the encoding feature will be tedious to split out.
There's a bunch of cascading stuff after entity_addrvec_t which
could go into a series of further patches, monmap, and so forth,
not all of which exist yet.

> 
> 4) We should make the new entity_addr_t encoding encode sockaddr piece 
> more compactly instead of eating up a full 80-byte sockaddr_storage even 
> for the ~8-byte IPv4 sockaddr_in.  Maybe just need to encode an 
> explicit length for the sockaddr_* piece?

Yes, this could be made more memory efficient.  It's mainly a matter
of making sure the constructor gets the proper amount of memory.
There's a C hack to do this, but I remember seeing a c++ technique
that was a bit less hackish.

A sockaddr_in is actually 16 bytes of which the last 8 are dummy
padding (sin_zero).  (And there there are some systems that
actually check to be sure they're 0.)

BSD4.4 derived systems have a length coded into sockaddr too - but on
linux it's usually possible to get by without storing the length, and
I don't think ceph will need to do so.

...

					-Marcus Watts
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux