Alex Rousskov wrote: > On 02/18/2013 04:01 PM, Linda W wrote: >> Has anyone looked at their average cached object size >> lately? >> >> At one point, I assume due to measurements, squid >> set a default to 13KB / item. >> >> About 6 or so years ago, I checked mine out: >> (cd /var/cache/squid; >> cachedirs=( $(printf "%02X " {0..63}) ) >> echo $[$(du -sk|cut -f1)/$(find ${cachedirs[@]} -type f |wc -l)] >> ) >> --- got '47K, or over 3x the default. >> >> Did it again recently: >> 310K/item average. >> >> Is the average size of web items going up or are these peculiar to >> my users' browser habits (or auto-update programs from windows >> going through cache, etc...). > > According to stats collected by Google in May 2010, the mean size of a > GET response was about 7KB: > https://developers.google.com/speed/articles/web-metrics > > Note that the median GET response size was less than 3KB. I doubt things > have changed that much since then. --- I'm pretty sure that google's stats would NOT be representative of the net as a whole. Google doesn't serve content -- the service indexes of content -- the indices of content are going to be significantly smaller than the content being indexed -- especially when pictures or other non-text files are included. > > Google stats are biased because they are collected by Googlebot. > However, if you look at fresh HTTP archive stats, they seem to give a > picture closer to 2010 Google stats than to yours: > http://httparchive.org/trends.php#bytesTotal&reqTotal > > (I assume you need to divide bytesTotal by reqTotal to get mean response > size of about 14KB). --- That's how I'd read that data. But I'll betcha they don't have any download sites on their top list. Add in 'downloads.suse.org' and see how the numbers tally. Have 2-3 users download that in a day and see if content is being cached... it IS cacheable. Some stuff like some of the ISO images gets into the gigabytes, though even I cut off caching above 1G. My maxmem cache size is 512MB, and max disk cache size is 1G... If you use the default squid settings, the maximum object size is 512KB and 4M, which would not cache most of the stuff on download sites -- so when a new release of some software distribution or package comes out, those stats won't be included in the averages. I would say it is hard to get an accurate reading of actual transfer size, if you have the cut-offs set at the defaults -- if you go to an image site like deviantart, or animepaper.net or most wallpaper sites, you'll find the average image sizes easily go over the memcache size, and for hires-images, those can easily exceed the default max disk-cache size. Losing all the stats for the larger files would bias any "average get" or cache file size. Seriously -- look at stats that cut off anything > 4M is going to strongly bias things.