On Sat, Aug 11, 2007, Michel Santos wrote: > > * don't write everything cachable to disk! only write stuff that has > > a good chance of being read again; > > there is a "good chance" beeing hit by a car when sleeping in the middle > of a highway as well there is a chance not beeing hit at all :) :) > well that was my knowledge about chances but here are not so many options, > or you are a hell of forseer or you create an algorithm, kind of inverting > the usage of the actual or other cache policies applying them before > caching the objects instead of controlling the replacement and aging No, you run two seperate LRUs, glued to each other. One LRU is for new objects that are coming in, another LRU is for objects which have been accessed more than once. The interesting trace to do on a production cache (remember, I don't have access to production ISP caches and haven't for quite a while) is to calculate the chance of seeing a subsequent request HIT after a certain period of time. You want to know what % of your objects that are being requested are never going to be seen again, where "never" can be some time length (say, a day; you can vary it in your experiment.) The idea here is that a large % of your requests are once off and won't be seen again; what you want to do is keep those in memory in case they're seen again and not write them to disk. You only want to write stuff to disk that has a higher chance of being requested later. It also may make your memory cache more efficient as you're not so worried about pushing hot objects out of the way to make room for transient bursts of new, never seen again objects. As an example, ZFS does this for its memory cache management to prevent a "find /" type workload killing the hot page cache set. So you're not predicting the future. :) This is all mostly conjecture on my part, but the papers from the early 2000's list these techniques as giving noticable returns. > interesting project because caching is not so hot anymore, bandwidth is > cheap in comparism to 10 years ago and the heck today is PtP so I mean, > probably hard to find a sponsor with good money. The most wanted feature > is proxying and acl but not cache so I guess even if there are ever geeks > like us which simply like the challenge to get a bit more out of it most > people do not know what this is about and do not feel nor see the > difference between ufs and coss or whatever. To be realistic I understand > that nobody cares about diskd as nobody cares really about coss because it > would be only for you or for me and some more and so Henrik works on aufs > because he likes it but at the end it is also only for him and some > others. And this sum of some do not have money to spend it into > coss/aufs/diskd. And probably it is not worth it when the principal users > have a 8Mb/s adsl for 40 bucks why they should spend money on squid's fs > development? A few reasons: * I want to do P2P caching; who wants to pony up the money for open source P2P caching, and why haven't any of the universities done it yet? * bandwidth is still not free - if Squid can save you 30% of your HTTP traffic and your HTTP traffic is (say) 50% of 100mbit, thats 30% of 30mbit, so 10mbit? That 10mbit might cost you $500 a month in America, sure, but over a year? Commodity hardware -can- and -will- effectively cache 100mbit for under a couple thousand dollars with the right software, and if done right the administration will be low to none existant. You'd pay for the cache inside 6 months without having to try; if the lifespan is longer than 12 months then its free money. * Now take the above to where its $300 a megabit (Australia), or even more in developing nations.. * .. or, how about in offices who want to provide access control and filtering of their 10-100mbit internet link .. * .. etc. There's plenty of examples where web caching is still important - and I'm just touching the forward caching; lets not even talk about how important reverse caching is in content delivery these days! - and there's still a lot of room for Squid to grow. Doubling the request rate and tripling the disk throughput for small objects is entirely within reason, even on just one CPU. Don't get me started on the possibilities of media/P2P caching on something large like a Sun thumper, with 20-plus SATA disks in a box. Would you like Squid to handle 100mbit+ of HTTP traffic on a desktop PC with a couple SATA disks? Would you like Squid to handle 500-800mbit of HTTP traffic on a ~$5k server with some SAS disks? This stuff is possible on today's hardware. We know how to do it; its just a question of writing the right software. Adrian