Hey again, It took me some time...The number of clients is sometimes irrelevant compared to other factors. For example a network with 30k Clients\Users which only access basic email service. So it might be possible that in some period of time your service will have this kind of load AVG.
You should continue to monitor the service using couple tools to identify what is the load around the clock. Also try to dump the basic info page you attached data from. Every 5 minutes would be OK as a starter.
And since you have asked somewhere in the thread about "why" is squid so unique about cpu Load AVG I think you deserve a detailed response. I am not *the* expert but you must know already that there are couple forms that a network service software is being built. I do not have the example I wanted right now but I wrote an example Ruby code of an endless queue "loop" software which consumes the CPU in an instant. Usually a queue based event handling software are not being understood well enough compared to a simple "select" based loops. The design should be in such a way that the CPU cycles would never be consumed by the software if not required but in most cases you will not see a *wait* state of it. Couple times I tried to understand how squid works and only after writing couple models in couple languages I kind of understood the basic concept. Eventually I got a really good description from Amos which confirmed my assumptions.
Most of the network services these days are based on some event driven engine\code with threading in it. It is the most used idea for the last years(I don't know since when). Most of these event driven approaches are efficient but lacks couple key points and in most cases since the developers are not novices they build these software's well and cover the special "cases". Squid however is an old piece of gold which uses a queue instead of only events. Since most of the event driven services use some kind of "select" which puts the software in some kind of *wait* mode you will probably catch these services in *wait* mode in top from time to time. If these services constantly work\run you will probably won't catch them in *wait* mode in top.
Specifically for the relationship between high CPU and disk IO I can assume that if a service relies on a queue compared to event based IO it sometimes would be confusing to understand why exactly the CPU is being used this is since in most event based DISK\IO programs there might be some use of files\IO "splice" for reads or writes. These are throwing most of the IO tasks into the kernel lands compares to the user-lands. The kernel is somehow probably the best in handling some IO operations efficiently(CPU related).
The above is far from complete but I think it's enough to understand that sometimes you might expect from top one thing but it will not reflect what you assume, then you need some insight into things.
Somehow I can maybe describe event driven code compared to queue using an ambulance or emergency services to a super-market or a restaurant queue. Unless there are special events the driver and the medic of the ambulance will be idle while in a restaurant you can see that as long as the restaurant is getting full things are starting to get busy. If you will "top" them both you will encounter a mostly idle(wait) process and in the other hand a continuously growing load process.
If indeed the restaurant was designed to be event driven based it would look somehow like the emergency service. Mostly idle but when triggered then getting very busy.
Again it's not 100% accurate so don't catch me on something and maybe others here can give couple better examples or descriptions then I do.
If you have specific questions about anything related to squid just ask. Eliezer* It is possible that some look-ups will cause the issues you described and the first thing to do is to limit the cache_dir sizes and to try an calculate based on couple weeks of analysis the amount of reasonable cache for this machine(not related to the storage media)
On 24/02/2016 21:44, Heiler Bemerguy wrote:
Hi Eliezer, thanks for your reply. As you've suggested, I removed all cache_dirs to verify if the rest was stable/fast and raised cache_mem to 10GB. I didn't disable access logs because we really need it.. And it is super fast, I can't even notice it using only ONE core.. (and it isn't running as smp) %Cpu0 : 0,7 us, 1,0 sy, 0,0 ni, 98,3 id, 0,0 wa, 0,0 hi, 0,0 si, 0,0 st %Cpu1 : 8,8 us, 5,6 sy, 0,0 ni, 76,1 id, 0,0 wa, 0,0 hi, 9,5 si, 0,0 st %Cpu2 : 8,7 us, 4,0 sy, 0,0 ni, 83,3 id, 0,0 wa, 0,0 hi, 4,0 si, 0,0 st %Cpu3 : 5,4 us, 3,4 sy, 0,0 ni, 86,2 id, 0,0 wa, 0,0 hi, 5,0 si, 0,0 st %Cpu4 : 7,8 us, 5,1 sy, 0,0 ni, 73,5 id, 6,8 wa, 0,0 hi, 6,8 si, 0,0 st %Cpu5 : 1,0 us, 1,0 sy, 0,0 ni, 98,0 id, 0,0 wa, 0,0 hi, 0,0 si, 0,0 st PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 11604 proxy 20 0 11,6g 11g 5232 S 48,4 72,2 72:31.24 squid Start Time: Wed, 24 Feb 2016 15:38:59 GMT Current Time: Wed, 24 Feb 2016 19:18:30 GMT Connection information for squid: Number of clients accessing cache: 1433 Number of HTTP requests received: 2532800 Average HTTP requests per minute since start: 11538.5 Select loop called: 68763019 times, 0.192 ms avg Storage Mem size: 9874500 KB Storage Mem capacity: 94.2% used, 5.8% free I don't think I had a bottleneck on I/O itself, maybe the hash/search of cache indexes was too much for a single thread? Best Regards,
_______________________________________________ squid-users mailing list squid-users@xxxxxxxxxxxxxxxxxxxxx http://lists.squid-cache.org/listinfo/squid-users