Search squid archive

Re: What would be the maximum ufs\aufs cache_dir objects?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 18/07/17 05:34, Eliezer Croitoru wrote:
So basically from I understand the limit of the AUFS\UFS cache_dir is at:
16,777,215 Objects.
So for a very loaded system it might be pretty "small".

I have asked since:
I have seen the mongodb ecap adapter that stores chunks and I didn't liked it.
In the other way I wrote a cache_dir in GoLang which I am using for the windows updates caching proxy and for now it's surpassing the AUFS\UFS limits.

Based on the success of the Windows Updates Cache proxy which strives to cache only public objects, I was thinking about writing something similar for a more global usage.
The basic constrain on what would be cached is only If the object has Cache-Control "public".

You would end up with only a small sub-set of HTTP every being cached.

CC:public's main reason for existence is to re-enable cacheability of responses that contain security credentials - which is prevented by default as a security fail-safe.

I know a fair number of servers still send it when they should not. But that is declining as content gets absorbed by CDN who take more care with their bandwidth expenditure.



The first step would be an ICAP service (respmod) which will log requests and response and will decide what GET results are worthy of later fetch.
Squid currently does things on-the-fly while the client transaction is fetched by the client.

What things are you speaking about here?

How do you define "later"? is that 1 nanosecond or 64 years?
and what makes 1 nanosecond difference in request timing for a 6GB object any less costly than 1 second?

Most of what Squid does and the timing of it have good reasons behind them. Not saying change is bad, but to make real improvements instead of re-inventing some long lost wheel design one has to know those reasons to avoid them becoming problems. eg. the often laughed at square wheel is a real and useful design for some circumstances. And their lesser bretheren cogwheels and the like are an age proven design in rail history for places where roundness actively inhibits movement.


For an effective cache I believe we can compromise on another approach which relays or statistics.
The first rule is: Not everything worth caching!!!
Then after understanding and configuring this we can move on to fetch *Public* only objects when they get a high repeated downloads.
This is actually how google cache and other similar cache systems work.
They first let traffic reach the "DB" or "DATASTORE" if it's the first time seen.

FYI: that is the model Squid is trying to move away from - because it slows down traffic processing. As far as I'm aware G has a farm of servers to throw at any task - unlike most sysadmin trying to stand up a cache.


Then after more the a specific threshold they object is being fetched by the cache system without any connection to the transaction which the clients consume.

Introducing the slow-loris attack.

It has several variants:
1) client sends a request, very , very, ... very slowly. many thousands of bots all do this at once, or building up over time. -> an unwary server gets crushed under the weight of open TCP sockets, and its normal clients get pushed out into DoS.

2) client sends a request. then ACK's delivery, very, very, ... very slowly.
-> an unwary server gets crushed under the weight of open TCP sockets, and its normal clients get pushed out into DoS. AND suffers for each byte of bandwidth it spent fetching content for that client.

3) both of the above.

The slower a server is at detecting this attack the more damage can be done. This is magnified by whatever amount of resource expenditure the server goes to before detection can kick in - RAM, disk I/O, CPU time, TCP sockets, and of most relevant here: upstream bandwidth.

Also, Loris and clients on old tech like 6K modems or worse are indistinguishable.

To help resolve this problem Squid does the _opposite_ to what you propose above. It makes the client delivery and the server fetch align to avoid mistakes detecting these attacks and disconnecting legitimate clients. The read_ahead_gap directive configures the threshold amount of server fetch which can be done at full server-connection speed before slowing down to client speed. The various I/O timeouts can be tuned to what a sysadmin knows about their clients expected I/O capabilities.


It might not be the most effective caching "method" for specific very loaded systems or specific big files and *very* high cost up-stream connections but for many it will be fine.
And the actual logic and implementation can be each of couple algorithms like LRU as the default and couple others as an option.

I believe that this logic will be good for specific systems and will remove all sort of weird store\cache_dir limitations.

Which weird limitations are you referring to?

The limits you started this thread about are caused directly by the size of a specific integer representation and the mathematical properties inherent in a hashing algorithm.

Those types of limit can be eliminated or changed in the relevant code without redesigning how HTTP protocol caching behaves.


Amos
_______________________________________________
squid-users mailing list
squid-users@xxxxxxxxxxxxxxxxxxxxx
http://lists.squid-cache.org/listinfo/squid-users




[Index of Archives]     [Linux Audio Users]     [Samba]     [Big List of Linux Books]     [Linux USB]     [Yosemite News]

  Powered by Linux