On 24/07/2015 6:08 a.m., Marcus Kool wrote: > The strace output shows this loop: > > Squid reads 16K-1 bytes from FD 13 webserver > Squid writes 4 times 4K to FD 17 /var/cache/squid3/00/00/00000000 > Squid writes 4 times 4K to FD 12 browser > > But this loop does not explain the 100% CPU usage... > > Does Squid do a buffer reshuffle when it reads 16K-1 and writes 16K ? Yes, several (UFS / AUFS 3, or diskd 5). TCP buffer -> FD 13 read buffer FD 13 read buffer -> 4x mem_node (4KB each) ** walk the length of the in-memory part of the object to find where to attach the mem_node. (once per each node?) - this has been a big CPU hog in the past (Squid-2 did it twice per node insertion) 4x mem_node -> SHM memory buffer - diskd only, AUFS uses mem_node directly SHM memory buffer -> FD 17 disk write latency - happens with both diskd (single treaded) and AUFS (x64 threads) - wait latency until completion event is seen by Squid ... 4x mem_node write() copy to FD 12 TCP buffer (OS dependent) If you are doing any kind of ICAP processing you can add +3 copies per service processing the transaction body. > > I did the download test with Squid 3.4.12 AUFS on an idle system with a > 500 mbit connection and 1 CPU with 4 cores @ 3.7 GHz. > The first download used 35% of 1 CPU core with a steady download speed > of 62 MB/sec. > The second (cached) download used 50% of 1 CPU core with a steady > download speed of 87 MB/sec. > I never looked at Squid CPU usage and do not know what is reasonable but > it feels high. > > With respect to the 100% CPU issue of Jens, one factor is that Squid > runs in a virtual machine. > Squid in a virtual machine cannot be compared with a wget test since > Squid allocates a lot of memory that the host must manage. > This is a possible explanation for the fact that you see the performance > going down and up. > Can you do the same test on the host (i.e. not inside a VM). > > Marcus > > > > On 07/23/2015 10:39 AM, Jens Offenbach wrote: >> I have attached strace to Squid and waited until the download rate has >> decreased to 500 KB/sec. >> I used "cache_dir aufs /var/cache/squid3 88894 16 256 >> max-size=10737418240". >> Here is the download link: >> http://w1.wikisend.com/node-fs/download/6a004a416f65b4cdf7f8eff4ff961199/squid.strace >> >> I hope it can help you. >> *Gesendet:* Donnerstag, 23. Juli 2015 um 13:29 Uhr >> *Von:* "Marcus Kool" <marcus.kool@xxxxxxxxxxxxxxx> >> *An:* "Jens Offenbach" <wolle5050@xxxxxx>, "Eliezer Croitoru" >> <eliezer@xxxxxxxxxxxx>, "Amos Jeffries" <squid3@xxxxxxxxxxxxx>, >> squid-users@xxxxxxxxxxxxxxxxxxxxx >> *Betreff:* Re: Squid3: 100 % CPU load during object caching >> I am not sure if it is relevant, maybe it is: >> >> I am developing an ICAP daemon and after the ICAP server sends a "100 >> continue" >> Squid sends the object to the ICAP server in small chunks of varying >> sizes: >> 4095, 5813, 1448, 4344, 1448, 1448, 2896, etc. >> Note that the interval of receiving the chunks is 1/1000th of a second. >> It seems that Squid forwards the object to the ICAP server every time >> it receives >> one or a few TCP packets. >> >> I have a suspicion that in the scenario of 100% CPU, large #write >> calls and low throughput a similar thing is happening: >> Squid physically stores a small part of the object many times, i.e. >> every time one or a few TCP packets arrive. >> >> Amos, is there a debug setting that can confirm/reject this suspicion? After a bit more thought and Marcus feedback ; store.cc, mem_node operations, and fd.cc and comm.cc are probably all worth watching. "debug_options ALL,9" will get you everything Squid has to offer of course. But be aware that the debugging itself adds a horribly large amount of overheads for each line logged. At the highest levels it may noticably impact the high-speed core routines you are trying to measure by skewing latency into those with more debugs() statements. Amos _______________________________________________ squid-users mailing list squid-users@xxxxxxxxxxxxxxxxxxxxx http://lists.squid-cache.org/listinfo/squid-users