Re: [users@httpd] Does Module exists for manipulating html text?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Joshua Slive wrote:

> First, on a site doing 1000 requests/sec, you're going to see a
> serious hit no matter what if you choose to do extensive processing on
> every request.  You are *much* better off with static, unprocessed
> pages, where apache can use sendfile to make things quicker.  But you
> are correct that mod_ext_filter is about the slowest way to do this.

Agreed entirely.  If using mod_ext_filter, you might also bear in mind
that its forking overhead is equivalent to mod_cgi, and look at the
discussion of this on the mod_cgid page.  This will tell you something
possibly-unexpected about your choice of MPMs (bearing in mind there
isn't a mod_ext_filter_d).

> So the first thing that I would do is to try to convince the clients
> to do the processing in advance, assuming that you are serving static
> files.

The best option, but I think he said they were dynamic all the way.
So the next thing to consider: is anything cacheable with mod_cache?

> A second option is to use mod_deflate to shrink the responses.  You'll
> probably find it is at least as effective as your technique, in terms
> of reducing the size of the stuff going over the network, and
> certainly much much faster.

Agreed again, though mod_deflate is itself a processing overhead,
it's well worth it if data size is the critical issue.

> Third, you should look at Nick's mod_publisher:
> http://apache.webthing.com/mod_publisher/

Yep.  That uses a streaming parser (libxml2 with parseChunk), so the
processing overhead is closer to mod_include/SSI than to mod_ext_filter,
and is probably the fastest you'll get for markup manipulation unless
you can optimise something heavily towards the particular markup
your system generates.

> Last, you can write a custom apache filter to do the job.  Look at
> mod_case_filter in the experimental module directory of the apache
> source for an example.

The source code for some of my modules might be a better starting point.
I'd suggest mod_proxy_html.c, which is stable and in widespread use, and
is a functionally a subset of mod_publisher.

I did some very similar work to this for a client operating a mobile
accelerator, a proxy service to compress web contents for slow client
devices.  By far the biggest savings were those due to mod_deflate,
and compression of images by reducing them to the client device's
display capabilities.  Reducing markup gave altogether smaller savings,
although we nevertheless aggressively pursued it.

-- 
Nick Kew

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx
   "   from the digest: users-digest-unsubscribe@xxxxxxxxxxxxxxxx
For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx



[Index of Archives]     [Open SSH Users]     [Linux ACPI]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Squid]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux