Re: Stripping white space from HTML

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Grant wrote:
>> As I understand it, the reason why you wish the whitespace to be reduced
>> is so you look at the source within your browser, and that you plan to
>> use mod_deflate later ro reduce bandwidth (which is surely not too much
>> of a problem - it's probably equivalent to resampling a few of your
>> images by 5% here and there, or optimising your caching!), but you can't
>> reduce the whitespace inside your application logic (which is where the
>> problem should be fixed) because you don't have control over the code.
>>
>> You have 3 (1,2,4) really good (performance neutral) options not
>> mentioned so far,
>> 1) use a whitespace stripping http proxy you run on your LAN
>> 2) use mod security, removewhitespace in response body
>> 3) use a rewrite rule to a reg exp based whitespace server-side script
>> which serves each page of your application.
>> 4) similar to (3) use an autoprepend rule to serve your white space
>> laden pages through a reg exp based whitespace stripping script.
>>
>> I would probably go for 1,3 or 4, because they are so easy.
>>
>> (2) carries a performance hit, but use of mod security is highly
>> regarded and I would say is an esssential part of protecting an
>> application such as yours - one for which you do not own and cannot
>> change the code.
>
> Thanks Matt.  3 and 4 sound interesting.  How could I configure
> something like that considering my <Location /> block:
>
> <Location />
>       SetHandler perl-script
>       PerlResponseHandler Interchange::Link
>       PerlOptions +GlobalRequest
>       PerlSetVar InterchangeServer /path/to/socket
> </Location>
>
> mod_security also sounds interesting.  It's pretty tricky to set up
> though?
>
> - Grant
>
> ---------------------------------------------------------------------
> The official User-To-User support forum of the Apache HTTP Server
> Project.
> See <URL:http://httpd.apache.org/userslist.html> for more info.
> To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx
>   "   from the digest: users-digest-unsubscribe@xxxxxxxxxxxxxxxx
> For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx
>
>
I didnt get whether you are using apache > 2.0.46 if so have you tried:
| AddOutputFilter DEFLATE pl

as to (3)
|
Perl isn't my first language, but I can show you using php.

First setting up a rewrite (assuming you can use the rewrite engine on
your apache server) to point any scripts which have the whitespace
trouble to another script

ReWriteEngine On
ReWriteCond %{REMOTE_ADDR} ^some ip addresses$ReWriteRule
ReWriteCond %{REQUEST_URI} ^(/?path/to/offending/scripts/[^/]*)$
ReWriteRule .* /white_space_remover.php?path=%1 [QSA, L]

this rule rewrites all the requests to a script you identify as a
problem - you could rewrite all requests but the chances are its one
part of your app that's offending, so modify the path accordingly. The
rewriterule also only occurs if its your IP. comment it out if you dont
care whether your users see the whitespace. The path in the browser will
stay at the original offending script URL, but the
white_space_remover.pl will take the request url as a variable with the
Query string (if any) and use that data to get the data from the users
request that it will parse before sending to the user.
The same thing can be acheived using the autoappend directive in php, if
you have mod_php on your server. Instead of a rewrite you would use
php_value autoprepend whie_space_remover.php

the script would buffers the markup into a variable and use a quick reg
exp to parse through it to remove the blank lines and whitespace, and
then (optionally gzips it and) outputs it to the client user-agent.

<?php
//script that was requested is $_SERVER{REQUEST_URI}
|ob_start("ob_gzhandler");
ob_start("||remove_whitespace||");
header("Content-type: text/html; charset: UTF-8");
header( ||"Expires: " . gmdate("D, d M Y H:i:s", time() + 3600) . " GMT"
||);

function remove_whitespace($buffer) {
   $buffer = str_replace(array("\r\n", "\r", "\n", "\t", '  ', '    ',
'    '), '', $buffer);
return $buffer;|
}
//might have to append $_SERVER['QUERY_STRING};
require_once( $_SERVER{REQUEST_URI} );
exit;
?>

I am sure you can see how to modify the script to now gzip the data
since of course make more sense to use mod_deflate in apache. But the
white space stripping will work, and it all depends on your rewrite
routing the request to this script, which then uses the value of  the
original requested uri, along with the query string to serve the request.


OR, using method (1) you could install fiddler on a windows box
somewhere and write a custom handler in C# which would strip whitespace
from the response_body, and your browser could be set up to (foxyproxy)
proxy the request via the windows box to see the whitespace-less version.
Hope all that rough and unready and untested code helps!



-- 
Matthew Farey




---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx
   "   from the digest: users-digest-unsubscribe@xxxxxxxxxxxxxxxx
For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx


[Index of Archives]     [Open SSH Users]     [Linux ACPI]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Squid]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux