Search squid archive

Re: very poor performance of rock cache ipc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2023-10-14 12:04, Julian Taylor wrote:
On 14.10.23 17:40, Alex Rousskov wrote:
On 2023-10-13 16:01, Julian Taylor wrote:

When using squid for caching using the rock cache_dir setting the performance is pretty poor with multiple workers. The reason for this is due to the very high number of systemcalls involved in the IPC between the disker and workers.

Please allow me to rephrase your conclusion to better match (expected) reality and avoid misunderstanding:

By design, a mostly idle SMP Squid should use a lot more system calls per disk cache hit than a busy SMP Squid would:

* Mostly idle Squid: Every disk I/O may require a few IPC messages.
* Busy Squid: Bugs notwithstanding, disk I/Os require no IPC messages.


In your single-request test, you are observing the expected effects described in the first bullet. That does not imply those effects are "good" or "desirable" in your use case, of course. It only means that SMP Squid was no optimized for that use case; SMP rock design was explicitly targeting the opposite use case (i.e. a busy Squid).

The reproducer uses as single request, the same very thing can be observed on a very busy squid

If a busy Squid sends lots of IPC messages between worker and disker, then either there is a Squid bug we do not know about OR that disker is just not as busy as one might expect it to be.

In Squid v6+, you can observe disker queues using mgr:store_queues cache manager report. In your environment, do those queues always have lots of requests when Squid is busy? Feel free to share (a pointer to) a representative sample of those reports from your busy Squid.

N.B. Besides worker-disker IPC messages, there are also worker-worker cache synchronization IPC messages. They also have the same "do not send IPC messages if the queue has some pending items already" optimization.


and workaround improves both the single request case and the actual heavy loaded production squid in the same way.

FWIW, I do not think that observation contradicts anything I have said.


The hardware involved has a 10G card, not ssds but lots of ram so it has a very high page cache hit rate and the squid is very busy, so much it is overloaded by system cpu usage in default configuration with the rock cache. The network or disk bandwidth is barely ever utilized more than 10% with all 8 cpus busy on system load.

The above facts suggest that the disk is just not used much OR there is a bug somewhere. Slower (for any reason, including CPU overload) IPC messages should lead to longer queues and the disappearance of "your queue is no longer empty!" IPC messages.


The only way to get the squid to utilize the machine is to increase the IO size via the request buffer change or not use the rock cache. UFS cache works ok in comparison, but requires multiple independent squid instances as it does not support SMP.

Increasing the IO size to 32KiB as I mentioned does allow the squid workers to utilize a good 60% of the hardware network and disk capabilities.

Please note that I am not disputing this observation. Unfortunately, it does not help me guess where the actual/core problem or bottleneck is. Hopefully, cache manager mgr:store_queues report will shed some light.


Roughly speaking, here, "busy" means "there are always some messages in the disk I/O queue [maintained by Squid in shared memory]".

You may wonder how it is possible that an increase in I/O work results in decrease (and, hopefully, elimination) of related IPC messages. Roughly speaking, a worker must send an IPC "you have a new I/O request" message only when its worker->disker queue is empty. If the queue is not empty, then there is no reason to send an IPC message to wake up disker because disker will see the new message when dequeuing the previous one. Same for the opposite direction: disker->worker...

This is probably true if you have slow disks and are actually IO bound, but on fast disks or high page cache hit rate you essential see this ipc ping pong and very little actual work being done.

AFAICT, "too slow" IPC messages should result in non-empty queues and, hence, no IPC messages at all. For this logic to work, it does not matter whether the system is I/O bound or not, whether disks are "slow" or not.


 > Is it necessary to have these read chunks so small

It is not. Disk I/O size should be at least the system I/O page size, but it can be larger. The optimal I/O size is probably very dependent on traffic patterns. IIRC, Squid I/O size is at most one Squid page (SM_PAGE_SIZE or 4KB).

FWIW, I suspect there are significant inefficiencies in disk I/O related request alignment: The code does not attempt to read from and write to disk page boundaries, probably resulting in multiple low-level disk I/Os per one Squid 4KB I/O in some (many?) cases. With modern non-rotational storage these effects are probably less pronounced, but they probably still exist.

The kernel drivers will mostly handle this for you if multiple requests are available, but this is also almost irrelevant with current hardware, typically it will be so fast software overhead will make it hard to utilize modern large disk arrays properly

I doubt doing twice as many low-level disk I/Os (due to wrong alignment) is likely to be irrelevant, but we do not need to agree on that to make progress: Clearly, excessive low-level disk I/Os is not the bottleneck in your current environment.


you probably need to look at other approaches like io_ring to get rid of the classical read/write systemcall overhead dominating your performance.

Yes, but those things are complementary (i.e. not mutually exclusive).


Cheers,

Alex.

_______________________________________________
squid-users mailing list
squid-users@xxxxxxxxxxxxxxxxxxxxx
https://lists.squid-cache.org/listinfo/squid-users




[Index of Archives]     [Linux Audio Users]     [Samba]     [Big List of Linux Books]     [Linux USB]     [Yosemite News]

  Powered by Linux