Variance Function

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



It's been 20+ years since I took a stats class...

I didn't enjoy that class, and doubt if I remember 1% of what was
covered.

Given an input Unix date like:
1132565360

And an array of Unix dates like:
array(3) {
  [0]=>
  int(1132565342)
  [1]=>
  int(1132565360)
  [2]=>
  int(1132565359)
}

I would like to return the input date *IF* it is "reasonable" in its
variance from the date values in the array.

E.g, the above would output: 1132565360

If, however, the input date was "0" for the same array of dates, I'd
want to get, errr... Well, okay, the values in the array...

One of those might *ALSO* be wildly wrong. :-(

So I want the "most likely candidate" for a correct date out of all
this mess, where any of the date values might be wrong.

Any ideas?

Is there a nice built-in "sort out this variance mess for me" function
in PHP? :-)

The somewhat maybe obvious candidate of "stats_variance" is a bit
under-documented...

I'm not sure I'd even understand the numbers that came out of it, even
if I experimented with it and *THOUGHT* I understood the numbers
coming out of it.

And the sheer number of functions in the stats package is making my
head spin.

And I dunno if I could get statistics into the shared server anyway.

Maybe I should explain the "Big Picture", eh?

Okay, so the "Big Picture" is 14,000 emails in an Inbox, that need to
be processed, and tagged with their "date".
[And a whole lot more, but not relevant to this post...]

Seems simple enough, with that Date: header.

Except when it's not there. :-(

Okay, so take the Sent: header if there's no Date: header.

Okay.

No, wait...  Damn!

Some fools have their PC clock set to, like, 1970 or whatever.  So
let's be generous and assume their CMOS battery has died, and they
haven't had a chance to change it.  Fine.  Deal with it.

Okay, so *NOW* the algorithm is to do this:

Take the Date: header, or Sent: header if no Date: header -> $whatdate

Parse the Received: headers for the MTA date-stamps -> $fromdates[]

Compare the values in $fromdates array with $whatdate.

If the variance is "too high", then ignore the $whatdate, and take
the, errr, first?, average?, $fromdates[].

No, wait, maybe I should do a variance within the $fromdates in case
some stupid MTA server has a bad clock?

Any advice?

Anybody got a good "variance" function to do what I'm trying to do?

Am I on the entirely wrong path here?

Sheesh!

We may just ignore any obviously wrong dates, and process those by
hand...

-- 
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some starving artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux