On Tuesday 11 July 2006 04:41, Adrian Chadd wrote: > The disk access method has a rather huge influence on throughput. absolutely ! but not in first place all you said in your email I almost agree but you forgot that it does not matter having the heavy-duty funcar in the garage until we do not have the gasoline to feed it and the other way around it does not matter having "the gas" but not the car which can use it so before we go down to all this this "special things" we need a machine whch can handle it My favorite is diskd since it works so long we do not have any crash and it is fast the kqueue thing for bsd systems is not working well for me and other things which belong more to Linux I am not the right person to talk with since I am not using it and not experienced with it you compare ufs and sure it is not competetive for performance but even here is a question, where is the edge you feel it is slower or the other is faster, IMO it does not matter for small systems then you say coss is faster, humm theoretically it should be, but in my case I never could make it faster than my diskd. Might be my fault somewhere or might be that I use FreeBSD? I don't know, perhaps only because I understand well shared memory tuning? Sometimes I think coss could have an advantage on IDE/SATA disks but not on SCSI? thank's Hans > There's been plenty of papers written over the last ten years relating > to disk access patterns and a number of them are specific to web > caching. > > Its a well-known fact that using the unix filesystem in a > one-file-per-object method is generally inefficient - there's >1 operation > per create/write, open/read and unlink. If a web cache is populated with a > 'normal' webcache distribution then you'll find the majority of cache > objects (~95% in my live caches at work) are under 64k in size. Many (~50% > I think, I'd have to go through my notes) are under 32k in size. > > So it boils down to a few things: > > * arranging the disk writes in a way to cut back on the amount of seeking > during disk writes > * arranging the disk writes in a way to cut back on the amount of seeking > during disk reads > * handling replacement policies in an efficient way - eg, you don't want > to have high levels of fragmentation as time goes on as this may impact > on your ability to batch disk writes > * disk throughput is a function of how you lay out the disk writes and > how you queue disk reads - and disks are smokingly fast when you're > able to do big reads and writes with minimal seeks. > > Now, the Squid UFS method of laying out things is inefficient because: > > * Read/Write/Unlink operations involve more than one disk IO in some cases, > * Modern UNIX FSes have this habit of using synchronous journalling > of metadata - which also slows things down unless you're careful > (BSD Softupdates UFS doesn't do this as a specific counter-example) > * There's no way to optimise disk reads/write patterns by influencing the > on-disk layout - as an example, UNIX FSes tend to 'group' files in a > directory close together on the disk (same cylinder group in the case > of BSD FFS) but squid doesn't put files from the same site - or even > downloaded from the same client at any given time, in the same directory. > * Sites are split up between disks - which may have an influence on > scheduling reads from hits (ie, if you look at a webpage and it has > 40 objects that are spread across 5 disks, that one client is going > to issue disk requests to all /five/ disks rather than the more > optimal idea of stuffing those objects sequentially on the one disk > and reading them all at once.) > * it all boils down to too much disk seeking! > > Now, just as an random data point. I'm able to pull 3 megabytes a second of > random-read hits (~200 hits a second) from a single COSS disk. The > disk isn't running anywhere near capacity even with this inefficient > read pattern. This is from a SATA disk with no tagged-queueing. > The main problem with COSS (besides the bugs :) is that the write rate > is a function of both the data you're storing from server replies > (cachable data) and the "hits" which result in objects being relocated. > Higher request rate == high write rate (storing read objects on disk), > high hit rate == higher write rate (storing read and relocated objects > on disk.) > > This one disk system smokes a similar setup AUFS/DISKD on XFS and EXT3. > No, I don't have exact figures - I'm doing this for fun rather than > a graduate/honours project with a paper in mind. But even Duane's > COSS polygraph results from a few years ago show COSS is quite noticably > faster than AUFS/DISKD. > > The papers I read from 1998-2002 were talking about obtaining random > reads from disks at a read rate of ~500 objects a second (being <64k in > size.) Thats per disk. In 1998. :) > > So, its getting done in my spare time. And it'll turn a Squid server > into something comparable to the commercial caches from 2001 :) > (Ie, ~2400 req/sec with the polygraph workloads with whatever offered > hit rate closely matched.) I can only imagine what they're able to > achieve today with such tightly-optimised codebases. > > > > Adrian > > On Tue, Jul 11, 2006, H wrote: > > Hi > > I am not so sure if the particular data access method is what makes the > > difference. Most real cases are bound to disk or other hardware > > limitations. Even if often discussed IDE/ATA disks do not come close to > > SCSI disk throughput in multi user environments. Standard PCs are having > > often exactly this limit of 2-5MB/s Rick says and you can do what you > > want there is nothing more. I believe that squid, when coming to the > > limit simple do not cache anymore and goes directly, means the cache > > server certainly runs useless on the edge and not caching. > > With good hardware, not necessarily server MBs, you can get 30MB/s as you > > say but I am not sure how much of this 30MB/s is cache data, do you get > > 5% or less from disk? > > We have some high bandwidth networks where we use squid on the main > > server as non-caching server. And then several parents where the > > cache-to-disk process is done. The main server seems to be bound only to > > the OS-pps limit (no disk access) and we get up to 90MB/s through it. > > The parent caches are queried by content type or object size. Of course > > the connection between this servers is GBit full duplex. We get this way > > up to 20% less bandwidth utilization. Times ago we got up to 40% but > > since emule and other ptp are very popular things are not so good > > anymore. > > What we use are FreeBSD servers 6.1-Stable version with squid14 as > > transparent proxy on AMD64 dual-opterons on the main servers and AMD64-X2 > > machines on the parent caches, all with SCSI-320 and very good and lots > > of memory. Main server 16GB up and the parents 4GB. Best experience and > > performance for standard hardware I got with Epox MB and AMD-X2 4400 or > > 4800. I run more than one squid process on each SMP server. > > > > Hans > > > > > > > > > > > > > > > > A mensagem foi scaneada pelo sistema de e-mail e pode ser considerada > > segura. Service fornecido pelo Datacenter Matik > > https://datacenter.matik.com.br > > A mensagem foi scaneada pelo sistema de e-mail e pode ser considerada > segura. Service fornecido pelo Datacenter Matik > https://datacenter.matik.com.br -- HM A mensagem foi scaneada pelo sistema de e-mail e pode ser considerada segura. Service fornecido pelo Datacenter Matik https://datacenter.matik.com.br