On 07/02/2017 10:33 AM, Jerry Regan wrote:
For reasons beyond my control, I am using Postgresql 9.4 on a MacBookPro
(development system). I use pgadminIII and psql for clients (I tried and
didn’t like the pgadmin4 UI; pg_dump, pg_restore also seem to be dumbed
down).
My question:
I have some performance test results in table cor_duration_report. One
column, c_entered_ion is of type timestamptz, another, c_scenario, is of
type text. I want to calculate the difference between succeeding
c_entered_ion rows to learn the rate at which entry events occur. In
cor_duration_report, c_entered_ion columns are NOT in ascending sort order.
For a first attempt, I created another table cor_temp_gap as:
CREATE TABLE cor_temp_gap
(
c_id serial NOT NULL,
c_entered_ion timestamp with time zone NOT NULL,
c_scenario text NOT NULL
)
WITH (
OIDS=FALSE
);
and loaded it with:
INSERT into cor_temp_gap (c_entered_ion, c_scenario) SELECT
c_entered_ion, c_scenario from cor_duration_report order by
c_entered_ion;
The c_id column is loaded with the default value - the next sequence value.
I then generated my report with:
select count( gap ) as gaps, sum(gap) as sum,
mode() within group (order by gap) as mode,
percentile_disc(0.5) within group (order by gap) as median,
avg( gap::integer ) as mean,
min( gap ) as min,
max( gap ) as max
from ( select extract( epoch from ( f.c_entered_ion -
s.c_entered_ion)::interval) * 1000 as gap
from cor_temp_gap s, cor_temp_gap f
where s.c_scenario = '20170628tc04'
and s.c_id+1 = f.c_id ) vals;
This seems to give me the results I want:
gaps | sum | mode | median | mean
| min | max
------+---------+------+--------+-----------------------+-----+--------
307412 | 6872207 | 1 | 8 | 22.3550381897908995
| 0 | 10846
The min value of zero is accurate. The mode value of 1 is reasonable, as
is the median value of 8. Using a totally different method, the mean
value is accurate, as is gaps (there are 307,413 rows in the table).
I do know enough sql to believe my cor_temp_gap table could probably be
replace by a ‘WITH SELECT….’
I attempted this:
with cor_entry_time as ( select nextval('cor_temp_select_c_id_seq')
as c_id, c_entered_ion, c_scenario
from cor_duration_report where c_scenario = '20170628tc04' order by
c_entered_ion )
select count( gap ) as gaps,
sum(gap::integer) as sum,
mode() within group (order by gap) as mode,
percentile_disc(0.5) within group (order by gap) as median,
avg( gap::integer ) as mean,
min( gap::integer ) as min,
max( gap::integer ) as max
from ( select extract( epoch from ( f.c_entered_ion -
s.c_entered_ion)::interval) * 1000 as gap
from cor_entry_time s, cor_entry_time f
where s.c_id+1 = f.c_id ) vals;
I used this site to reformat the above:
http://sqlformat.darold.net/
WITH cor_entry_time AS (
SELECT
nextval('cor_temp_select_c_id_seq') AS c_id,
c_entered_ion,
c_scenario
FROM
cor_duration_report
WHERE
c_scenario = '20170628tc04'
ORDER BY
c_entered_ion
)
SELECT
count(gap) AS gaps,
sum(gap::INTEGER) AS SUM,
MODE ()
WITHIN
GROUP (
ORDER BY
gap) AS MODE,
percentile_disc (0.5)
WITHIN
GROUP (
ORDER BY
gap) AS median,
avg(gap::INTEGER) AS mean,
min(gap::INTEGER) AS MIN,
max(gap::INTEGER) AS MAX
FROM (
SELECT
extract(EPOCH
FROM (f.c_entered_ion - s.c_entered_ion)::interval) * 1000 AS gap
FROM
cor_entry_time s,
cor_entry_time f
WHERE
s.c_id + 1 = f.c_id) vals;
Still have not figured out everything that is going on above, but it
gave me a fighting chance:)
which returned:
gaps | sum | mode | median | mean
| min | max
--------+---------+------+--------+---------------------+----------+-------
307412 | 6867802 | 0 | 461 | 22.3407088857949592 |
-6871881 | 59791
The results should match but obviously they don’t. the ‘mode’, ‘median’,
‘min’ and ‘max’ columns are clearly different. The ‘sum’ and ‘mean’
columns are similar but are also different. Only ‘gaps’ is the same.
There should be no negative numbers at all, assuming my c_entered_ion
column is in ascending order. Wouldn’t the 'order by c_entered_ion’ in
the ‘WITH’ select do that?
I believe you are going to have to move the ORDER BY to outside the
WITH. If I am following correctly:
FROM (
SELECT
extract(EPOCH
FROM (f.c_entered_ion - s.c_entered_ion)::interval) * 1000 AS gap
FROM
cor_entry_time s,
cor_entry_time f
WHERE
s.c_id + 1 = f.c_id
ORDER BY
s.c__entered_ion
) vals;
You can remove the:
ORDER BY
c_entered_ion
in the WITH.
To me, the differences between loading ‘cor_temp_gaps’ and my ‘WITH’
select should not cause the difference. The differences are in how
‘c_id’ is generated and the fact that selecting only ‘c_scenario =
‘20170628tc04’ has been moved from calculating the interval to the
‘WITH’ select. I have also tried the ‘WITH SELECT’ approach without
moving that test and received the same results.
My suspicion is that in the failing approach, my sequence is being
assigned before the sort whereas when I load ‘cor_temp_gap’, and c_id
defaults to a sequence then c_id is generated AFTER c_entered_ion is put
in sort order.
If my suspicion is right, how do accomplish the same thing in the ‘WITH
SELECT’ case?
If it is wrong, what am I doing wrong?
Thanks for any insights you may be able to provide!
/s/jr
Consultant
Concerto GR
Mobile: 612.208.6601
Concerto - a composition for orchestra and a soloist
--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general