Meaning and interpretation of 206 code sizes in Apache logs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The VTP project (vterrain.org) recently had an interesting experience where bandwidth-monitoring (and billing) procedures reported an enormous bandwidth spike, resulting in a big hosting bill.

The billing situation was resolved amicably, but in the spirit of understanding what really happened, we've been investigating the cause and effect. It appears that the traffic came from (legitimate, non-malicious) Far East users, probably on the other side of a problematic Internet connection, most likely caused by the damage to an undersea cable in December. The result of the network unreliability was that they had difficulty downloading large data intact from the VTP server, and employed a common download-accelerator that is able to keep retrying the server and resuming a file-transfer partway through. All of this is very normal and common.

From the logs, it would appear that the download accelerator requests the file with a Range header (with a fairly large range). Then, Apache begins sending the data. Shortly, the connection fails (after the client has only received a little of the data) and Apache logs the request. The process repeats. Eventually, the client does receive the entire file, but not until after many many 206 (partial-content) entries are logged.


  According to the host (Hurricane Electric, HE.net):
http://www.he.net/faq/traffic_storage.html
"What Methods do you use to determine traffic usage?
We determine web traffic usage by extracting information from the access_log files generated by the HTTP daemon."

  Based on this, we come to some interesting contradictions:


>From http://vterrain.org/.status/web.html

Time  Requests  Bytes Sent
----- -------- ------------
00:00      893     18955972
01:00    22384 141295771628
02:00     4330   3881123738
03:00      626      7740512

Yup, in one hour, a naive counting of Apache's "bytes transferred" says
there was 141 GB of traffic.  About a month's worth of normal traffic - in
an hour.

Apache's log has endless amounts of this:

211.162.235.226 - - [03/Feb/2007:01:08:12 -0800] "GET /data/island_8k.zip
HTTP/1.1" 206 10359197
211.162.235.226 - - [03/Feb/2007:01:08:12 -0800] "GET /data/island_8k.zip
HTTP/1.1" 206 42725451
211.162.235.226 - - [03/Feb/2007:01:08:12 -0800] "GET /data/island_8k.zip
HTTP/1.1" 206 15765857
211.162.235.226 - - [03/Feb/2007:01:08:12 -0800] "GET /data/island_8k.zip
HTTP/1.1" 206 37503695
211.162.235.226 - - [03/Feb/2007:01:08:12 -0800] "GET /data/island_8k.zip
HTTP/1.1" 206 21176362
211.162.235.226 - - [03/Feb/2007:01:08:12 -0800] "GET /data/island_8k.zip
HTTP/1.1" 206 5127115
211.162.235.226 - - [03/Feb/2007:01:08:12 -0800] "GET /data/island_8k.zip
HTTP/1.1" 206 10359197
211.162.235.226 - - [03/Feb/2007:01:08:12 -0800] "GET /data/island_8k.zip
HTTP/1.1" 206 42725451
211.162.235.226 - - [03/Feb/2007:01:08:12 -0800] "GET /data/island_8k.zip
HTTP/1.1" 206 37503695
211.162.235.226 - - [03/Feb/2007:01:08:13 -0800] "GET /data/island_8k.zip
HTTP/1.1" 206 15765857
211.162.235.226 - - [03/Feb/2007:01:08:13 -0800] "GET /data/island_8k.zip
HTTP/1.1" 206 21176362

If those were real bytes transferred, it would be something like 400
MB/second.  This is to distant users happy to get K/second.


  So, this raises a couple of interesting questions.


  1. Is HE.net making a mistake in using access_log to count bandwidth?
   1a. What would be the right way to do it?
1b. Are they doing it this way because they're using some existing tool that does it this way? 1c. Are other hosts doing it this way, and therefore are mistakenly over-measuring and over-charging customers?

2. What exactly does the number after the 206 code in the access_log mean? Is it simply the range the client _requested_ via the Range header? In which case, it has no real relationship to how much data was _actually_ transferred?


Appreciate any insight from anyone. As I said before, we resolved the billing problem with HE.net without any problem, and have been very happy with their service. But, if this is a common mistake (bandwidth measurement via access_log) then the nature of the misunderstanding should be made more public so that others can ensure they aren't burned by it.

I apologize for any mistakes in terminology or assumptions in my message. I'm not an Apache guru and I don't play one on TV. I'm a 3D graphics programmer.


--
     Chris 'Xenon' Hanson | Xenon @ 3D Nature | http://www.3DNature.com/
 "I set the wheels in motion, turn up all the machines, activate the programs,
  and run behind the scenes. I set the clouds in motion, turn up light and sound,
  activate the window, and watch the world go 'round." -Prime Mover, Rush.

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx
  "   from the digest: users-digest-unsubscribe@xxxxxxxxxxxxxxxx
For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx


[Index of Archives]     [Open SSH Users]     [Linux ACPI]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Squid]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux