James Antill wrote:
>
Furthermore, I absolutely don't want to return the same mirror at the
top of the list _for everyone_ in a given country.
Hash MM's "primary" IP address to select one of the various available
mirrors, assuming they're returned in a consistent order?
If you are going to return a list of N mirrors, make N copies of that
list, rotating one position for each. Knock the last octet off the
source IP and hash the remaining part with some consistent algorithm
that will give you N values and use that to choose the copy of the list
you send.
Which is much harder than it sounds given that MM can't actually "make
N copies" of each list of IPs it might send out. But...
If you can get the list in a fixed order, you just have to replace the
code that randomizes it with something that isn't 'worst-possible-case'
for a site with a caching proxy. You could get some improvement simply
by setting cache control headers on the list for some reasonable time -
but then it is much harder to correct a mistake.
Everything is as distributed and robust as before, but you
don't defeat attempts to save your bandwidth with caching proxies.
This is _only_ true if you are getting asked for the list from every
single IP address, or that the subset of IP addresses you are getting
asked from happen to be as random/distributed as what MM does now.
That's up to the hashing algorithm. I'm not an expert, but someone
should be able to pick one that can take the first 3 octets of an IP
address as input and give an essentially random distribution. For brute
force you could convert the address to ascii, md5 it, then take modulo
the number of list items as the starting point. There's probably
something much more efficient, but that should give you randomness. I'd
drop the last octet so clustered proxies in the same class C subnet or
behind NAT gateways with multiple public addresses would get the same list.
You might argue that it'll probably "random/distributed enough", but I
find it much easier to believe that the above will solve your problem
and you didn't get much further than that in your analysis.
It isn't 'my' problem. It's everyone's problems that the mirrors have
to send many times the number of copies that they would if you stop
going out of your way to defeat existing caching infrastructure. And I
intentionally left the choice of hashing algorithm up to someone who is
more familiar with their nature. Personally, I don't think it can get
any worse than it is so I'm probably not qualified for the analysis
you'd like. As long as you keep giving the whole list, the clients will
find something that works even if it isn't optimal. Or maybe yum could
look for proxy headers on the response and (optionally) randomize by
itself if there are none.
--
Les Mikesell
lesmikesell@xxxxxxxxx
--
fedora-devel-list mailing list
fedora-devel-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/fedora-devel-list