Trying again, having got no response. Any reaction to my questions? - Dave On Tue, Jun 12, 2007 at 11:42:42AM -0500, Dave Dykstra wrote: > On Tue, Jun 12, 2007 at 12:19:26AM +0200, Henrik Nordstrom wrote: > > m??n 2007-06-11 klockan 15:17 -0500 skrev Dave Dykstra: > > > > > of jobs. It quickly becomes impractical to distribute all the data from > > > just a few nodes running squid, so I am thinking about running squid on > > > every node, especially as the number of CPU cores per node increases. > > > The problem then is how to determine which peer to get data from. > > > > Multicast ICP sounds like it could be a reasonable option there. > > > > Regards > > Henrik > > I considered that, but wouldn't multicasted ICP queries tend to get many > hundreds of replies (on average, half the total number of squids)? It > would only use the first response it got back, but it doesn't seem very > efficient of network or compute resources to throw away all the others. > Do you know of other people who have used multicast ICP for this type of > application? > > The multicast TTL could help a little but probably not much. I expect > the servers are usually organized in smaller groups, with better network > connectivity within each group, but it isn't practical to ask the system > administrators to tell us which servers are in which group so everything > has to be automatic. They're very likely all on the same large subnet > with the switches sorting out the routing, so it isn't clear that > anything at squid's level would be able to tell how far away servers are > other than by small differences in response time, or more likely > throughput of large transfers. I also don't think we can really expect > we know can know the names of all the peers in order to list them in > "multicast-responder". > > - Dave