On Mon, Jun 25, 2007 at 11:57:21PM +0200, Henrik Nordstrom wrote: > m??n 2007-06-25 klockan 15:02 -0500 skrev Dave Dykstra: ... > > > I considered that, but wouldn't multicasted ICP queries tend to get many > > > hundreds of replies (on average, half the total number of squids)? > > Right.. so not so good when there is very very many Squid's.. > > you could modify the ICP code to only respond to HIT's on multicast > queries. This would cut down the number of responses considerably.. That's a good start. There would need to be a timeout in order to determine when to go to the origin server (or better, to a master local squid). > Another option is to build a hierarchy, grouping the Squid's in smaller > clusters, with only a selected few managing the wider connections. > > It's hard to get this fully dynamic however. Some configuration will be > needed to build the hierarchy. > > You'll probably have to extend Squid a bit to get what you want running > smoothly, with multicast ICP being one possible component to discover > the nearby nodes and exchanges between those, but I am not familiar with > your network topology of how the cluster nodes is connected together so > it's just a guess. I think it is very important for this to be widely accepted to keep the static configuration of all the nodes the same, and to have any kind of hierarchy dynamically discovered. Maybe it would work to keep track of the fastest respondents to occasional multicast queries, and to keep track of the transfer rates of data transferred from them. Those that are the fastest get queried first with unicast ICP (perhaps in parallel), and if none of them have a hit then do a multicast query. Also, nodes that are heavily loaded need not reply to multicast queries. > This kind of setup could also benefit a lot from intra-array CARP. Once > the cluster members is known CARP can be used to route the requests in a > quite efficient manner if the network is reasonably flat. > > If the network is more wan like, with significantly different levels of > connectivity between the nodes then a more grouped layout may be needed, > building a hierarchy ontop of the network topology. I expect the network to be very much a LAN, but still to have significant different levels of throughput. For example, a cluster I know about is planned to have full non-blocking gigabit connectivity between the 40 nodes on each rack, but to have only a gigabit to a central switch between each of switches of 50 racks (and each node will be dual quad-core, for a total of 16,000 cores). I think all of the nodes will be on a single IP subnet, with the switches automatically sorting out the packet routing (although I'm not sure about that). So I don't think you could call that reasonably flat, nor that it would help to have a single master for each object that all nodes would get the object from (as I understand CARP would do). Some objects could be pretty large, say 50MB, and sending that object from one node to all the 1999 others would be much too slow, especially if there's several such objects hosted on nodes on the same rack. Bittorrent gets around the large object problem by splitting them up into fixed-sized chunks and loading them out of order, but that's not an option with http. > Is there some kind of cluster node management, keeping track of what > nodes exists and mapping this to the network topology? Or do everything > need to be discovered on the fly by each node? I want this to work on a wide diversity of clusters administered by different people at different universities & labs, so in general it has to be discovered on the fly. It so happens that the cluster I was describing above is special purpose, where every node starts the same program at the same time, and we're planning on statically configuring it into a fixed hierarchy of cache_peer parents (probably with each parent serving about four children). The networking topology of other clusters will vary but I expect that most of them will have similar limitations in order to keep down the cost of the networking hardware. > Regards > Henrik > - Dave