Search squid archive

Re: Content Adaptation with HTTPs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Amos Jeffries wrote:
On 20/08/17 16:05, Christopher Ahrens wrote:

The current solution doesn't work for me since it only supports a very
limited number of clients.  I am working with a charity that provides
internet services to those with impaired vision, the intention of my
project was to set up a semi-public proxy for recipient of the charity
(EG, we would install DD-WRT like routers within their homes that
would create a tunnel into our network so that they could browse the
internet using off-the-shelf systems.  We recently received a large
number of tablets form a corporate donor, the tablets themselves will
work for our recipients, but unfortunately the internet at large does
not.

FYI: If you can get the adaptation part to be small enough a non-caching
Squid should be able to run on those WRT-like devices with under 32 MB
of RAM needed. So the tunnel may not be necessary, just a way to update
the software and its config.

Part of it is to pre-shrink the size of the pages to prevent saturating the tunnel. A lot of our recipients have low-cost internet connections (Usually between 1-5 Mbps). From my personal experiences, the transformation are probably cutting about 75%-80% of excess garbage from website.

We're also looking at possibly building tiny x86 or ARM-based boxes that can be deployed to their homes to do caching to further reduce the load on their internet connections. The biggest complaint we have is why it takes so long to load pictures and words especially since a lot of the pictures are the same page-to-page (I am having a very hard time arguing with them...)

We can get a lot of hardware from local companies, but not so much in the way of software or services



We've looked into commercial systems in the past, but we cannot afford
the cost of commercial systems, especially since we are unsure about
the exact licensing that would be needed for our endeavor.  We have
also been burnt in the past with commercial software where the project
either goes dead, begins to require insanely expensive appliances, or
the license price is sent sky-high.

Would it be possible to use a setup of Squid <-> Privoxy <-> Squid to
execute this?  I figure we'd build an internal instance that will
handle the client<->proxy part, Privoxy handles the content
modification, then a second Squid instance to handle the web
server<->proxy part.

Squid will only send SSL-Bump'ed HTTPS traffic over encrypted
connections. So that is only possible if privoxy accepts TLS connections
from Squid. In which case you probably do not need the second Squid, as
privoxy would also be doing the HTTPS to-server part easily enough itself.


Unfortunately Privoxy doesn't do HTTPs. We looked into using it, but it can only do domain blocking for HTTPs, not content manipulation.




SO it looks like the solution would be to find a developer to write an
ECAP to cycle through regexes to replace/remove HTML/CSS content.  So
time to dig out my old C++ books and get to work...

If the existing ICAP/eCAP options are not suitable, then yes a custom
one would be needed.

It is not as easy as a few regex replacements though. Adaptors are
streamed the full on-wire HTTP message format with only minor
sanitization by Squids parser. To alter the content you will have to
deal with data encodings, object ranges, partially received objects. And
it is best to assume everything is of infinite length unless explicitly
told otherwise - so no buffer-then-adapt code.
 eCAP is simpler than ICAP, but still has to deal with these HTTP features.

Those are a big part of why available software is so sparse. The other
part being that HTTP traffic payloads are copyright content, so there
are legal issues with selling software for the purpose of altering
copyright content sans authors permission.


Yeah, I was a bit afraid that would be the case. I was planning on seeing how GreaseMonkey and ABP handle data streams since they seem to be able to handle streaming media. Or dig into Privoxy to see how things are done in there. Might find it to be easier to adapt it as an ICAP/ECAP by changing its input / output functions to be ICAP/ECAP interface rather than TCP.

For now, I'm thinking that I'll just let HTTPS pass through without modification and let Privoxy handle http. Seems to be the easiest way to do things.

Amos
_______________________________________________
squid-users mailing list
squid-users@xxxxxxxxxxxxxxxxxxxxx
http://lists.squid-cache.org/listinfo/squid-users

_______________________________________________
squid-users mailing list
squid-users@xxxxxxxxxxxxxxxxxxxxx
http://lists.squid-cache.org/listinfo/squid-users




[Index of Archives]     [Linux Audio Users]     [Samba]     [Big List of Linux Books]     [Linux USB]     [Yosemite News]

  Powered by Linux