Parsing through an Apache Log File?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello everyone!  I've been given the responsiblity of coding an apache access_log parser.  What my tasks are to do is to return the number of hits for certain file extensions that happen on certain dates with specific IP address.

As of now I'm only going back 7 days in the log looking for this information and I'm only looking for 5 file types (.doc, .pdf, .html, .php, and .flv).  I'm using the fgets() function so I can read the file line by line and do the matches that I need to do and increment the counters as needed.  Right now I have 3 loops looking for everything, which seems to me not to be the best way of doing this.  I've also encountered that a line may have the file extension I want but it's actually the soucre of another file.  (see below for example)

Log file example:
I want the first line but not the second line.  The second line has a .css file which was used by the .html file therefore I don't want this line.  I do want the first line that all it has is .html and no other files.

10.25.40.64 - - [01/Jan/2006:07:33:18 -0600] "GET /home.html HTTP/1.1" 200 8220 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
10.25.40.64 - - [01/Jan/2006:07:33:18 -0600] "GET /styles/redesign.css HTTP/1.1" 200 2381 "http://wfmu.wfm.pvt/home.html"; "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"

At any rate, here's some of my psudo code/code for what I'm trying to accomplish.  I know there has to be a better way for this and I'm looking for suggestions!


//path to log file
$path = "./";
//name of log file
$log_filename = "access_log";

if (!file_exists($path.$log_filename)) {
	echo "file does not exists!";
	die;
}

//open log file
if (!$handle = fopen($path.$log_filename, "r")) {
	echo "error in loading file!";
	die;
}

//get date range from past 7 days put into array for comparision of log file
$dates = array();
$days = 7;
for ($i=1;$i<=$days;$i++) {
	$dates[] = date("d/M/Y", strtotime("-$i day"));
}

//get document types that need to match
$docs = array();
$docs[] = ".doc";
$docs[] = ".pdf";
$docs[] = ".html";
$docs[] = ".htm";
$docs[] = ".php";
$docs[] = ".flv";

$contents = "";
while (!feof($handle)) {
	$line = fgets($handle);
	//look to see if the line has a date we are looking for
	foreach ($dates as $date) {
		//if date is in the line look for the doc type we want
		if (strpos($line, $date)) {
			//look to see if the line has the doc type we want
			foreach ($docs as $doc) {
				//if the line has the doc type we want then grab the region
				//and increment the counter for page hit
				//make sure to break out of the loops once found
				//need to add functionality for lines that have file extensions
				//that are not wanted but also have file extensions that are wanted
				if (strpos($line, $doc) {
					
					break;					
				} //end if
			} //end foreach ($docs as $doc)
			break;
		} //end if
	} //end foreach ($dates as $date)
}


//close log file
fclose($handle);

Thanks!
Jay

[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux