Re: seeking advice on using details in access_log to distinguish slow requests from slow repsonses

Mike Dewhirst <miked@xxxxxxxxxxxxxxx> · Mon, 6 Jan 2025 18:20:50 +1100

At $WORK, we're trying to instrument our Apache 2.4.51 server to
diagnose latency issues.
from pathlib import Path
def filter_files(ips=["43.333.128.223", "SRC="" find=""):
    """Read a log file and output a txt file sans lines with ips."""
    root = Path.cwd()
    for log in root.glob("*.log*"):
        with open(log, "r") as infile, open(log.with_suffix(".txt"), "w") as outfile:
            for line in infile:
                if find and find in line:
                    outfile.write(line)
                elif not any(ip in line for ip in ips):
                    outfile.write(line)

if __name__ == "__main__":
    filter_files(find="133.170.190.204")

Among our goals is to instrument our logs to distinguish between
clients taking too long to complete requests, and our server from
taking too long to provide a response.

Further, we want those logs to be terse, as disk space and log
retention are important.

I've been exploring combinations of:

  from mod_logio:
    LogIOTrackTTFB 

  from mod_log_config:
    %D		The time taken to serve the request, in microseconds.
    %^FB	Delay in microseconds between when the request
		arrived and the first byte of the response headers
		are written.
   %{begin:%d/%b/%Y:%T}t.%{begin:usec_frac}t
		beginning of request processing, in microseconds.

When I compare the recorded timestamps with a packet capture, and
a purposely-slowed POST request, I see that both of these counters
seem to start from when the first packet of the request comes in,
not when the whole request has been received.

As such, the timers describe the whole request/response cycle, and
don't let us infer where the latency is.

We've also explored info-level logging into our error_log, which
does yield useful records:

  when a TLS session has completed setup.
  when the request was completely received.

These are both great, and coupled with the details we can get in
our access_log, we can better pinpoint latency.

But, info-level logging yields a firehouse of data that floods the
logs, hamper retention.

Does anyone have any suggestions as to how to introduce a timestamp
into the access log as to when the entire request was received?

Or how to expose the details we see in the error_log, but in a way
that's much more terse than all of info?

Thanks for any pointers!

-- 
We recommend signal.org

Signed email is an absolute defence against phishing. This email has
been signed with my private key. If you import my public key you can
automatically decrypt my signature and be sure it came from me. Your
email software can handle signing.

OpenPGP_signature.asc