Search Postgresql Archives

Re: how to identify outliers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Rhys A.D. Stewart wrote:
Im asking how to get the ones that dont fall near the avg.... so for
example lets say i have the following distances:
10,11,12,11,10,9,9,10,11,12,10,11,99

then 99 would be an outlier. the avg would be like 16 or 17 i reckon
with the 99. so i want a way to find aan outlier, remove it and then
recalcuate the avg...and then i'd get a 'better' avg.....

i did some seraching about outliers and most of hits are about R or
spss or some other statistical package.....so looking for a way to do
it wholly in pgsql.


Rhys

On Tue, Oct 27, 2009 at 4:53 PM, Ben Chobot <bench@xxxxxxxxxxxxxxx> wrote:
Are you asking how to find the average and standard deviation? Or how to
compare the your data against some set values? Perhaps an example would be
appropriate; it's not very clear to me what you're asking.

Rhys A.D. Stewart wrote:
Hey all,
I have the following table:  data(pnum text, distance float8, route text).
I would like to remove the outliers in distance, i.e. lets say i get
the avg dist of pnum for each route and the std deviation of the
distance what is the best way to identify the outliers?


Rhys.

Oh, so you want to "cook" your data? I don't agree with that conceptually, but:


WITH base AS (
  SELECT random(1, 100)::int AS i
  FROM generate_series(1, 100) i
),
stats AS (
  SELECT avg(i) AS dist_avg, stddev(i) AS dist_dev FROM base
)
SELECT count(i), avg(i) AS new_avg, MIN(stats.dist_avg) AS old_avg,
  stddev(i) AS new_dev, MIN(stats.dist_dev) AS old_dev
FROM base, stats
WHERE base.i BETWEEN stats.dist_avg - dist_dev
  AND stats.dist_avg + dist_dev


Scott Bailey

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux