Re: Using 'WITH SELECT' Results Do Not Match 'SELECT FROM <table>' Results

Adrian Klaver <adrian.klaver@xxxxxxxxxxx> · Mon, 3 Jul 2017 16:21:46 -0700

On 07/02/2017 10:33 AM, Jerry Regan wrote:
For reasons beyond my control, I am using Postgresql 9.4 on a MacBookPro 
(development system). I use pgadminIII and psql for clients (I tried and 
didn’t like the pgadmin4 UI; pg_dump, pg_restore also seem to be dumbed 
down).

My question:
I have some performance test results in table cor_duration_report. One 
column, c_entered_ion is of type timestamptz, another, c_scenario, is of 
type text. I want to calculate the difference between succeeding 
c_entered_ion rows to learn the rate at which entry events occur. In 
cor_duration_report, c_entered_ion columns are NOT in ascending sort order.

For a first attempt, I created another table cor_temp_gap as:

    CREATE TABLE cor_temp_gap
    (
       c_id serial NOT NULL,
       c_entered_ion timestamp with time zone NOT NULL,
       c_scenario text NOT NULL
    )
    WITH (
       OIDS=FALSE
    );

and loaded it with:

    INSERT into cor_temp_gap (c_entered_ion, c_scenario) SELECT
    c_entered_ion, c_scenario from cor_duration_report order by
    c_entered_ion;

The c_id column is loaded with the default value - the next sequence value.

I then generated my report with:

    select count( gap ) as gaps, sum(gap) as sum,
            mode() within group (order by gap) as mode,
            percentile_disc(0.5) within group (order by gap) as  median,
            avg( gap::integer ) as mean,
            min( gap ) as min,
            max( gap ) as max
    from ( select extract( epoch from ( f.c_entered_ion -
    s.c_entered_ion)::interval) * 1000 as gap
            from cor_temp_gap s, cor_temp_gap f
            where s.c_scenario = '20170628tc04'
            and s.c_id+1 = f.c_id ) vals;

This seems to give me the results I want:

      gaps     |   sum      | mode    | median |         mean           
                 | min |  max
    ------+---------+------+--------+-----------------------+-----+--------
      307412 | 6872207 |    1       |      8      | 22.3550381897908995
        |   0 | 10846

The min value of zero is accurate. The mode value of 1 is reasonable, as 
is the median value of 8. Using a totally different method, the mean 
value is accurate, as is gaps (there are 307,413 rows in the table).

I do know enough sql to believe my cor_temp_gap table could probably be 
replace by a ‘WITH SELECT….’

I attempted this:

    with cor_entry_time as ( select nextval('cor_temp_select_c_id_seq')
    as c_id, c_entered_ion, c_scenario
    from cor_duration_report where c_scenario = '20170628tc04' order by
    c_entered_ion )
    select count( gap ) as gaps,
            sum(gap::integer) as sum,
            mode() within group (order by gap) as mode,
            percentile_disc(0.5) within group (order by gap) as  median,
            avg( gap::integer ) as mean,
            min( gap::integer ) as min,
            max( gap::integer ) as max
    from ( select extract( epoch from ( f.c_entered_ion -
    s.c_entered_ion)::interval) * 1000 as gap
            from cor_entry_time s, cor_entry_time f
            where s.c_id+1 = f.c_id ) vals;

I used this site to reformat the above:

http://sqlformat.darold.net/

WITH cor_entry_time AS (
    SELECT
        nextval('cor_temp_select_c_id_seq') AS c_id,
        c_entered_ion,
        c_scenario
    FROM
        cor_duration_report
    WHERE
        c_scenario = '20170628tc04'
    ORDER BY
        c_entered_ion
)
SELECT
    count(gap) AS gaps,
    sum(gap::INTEGER) AS SUM,
    MODE ()
    WITHIN
GROUP (
ORDER BY
    gap) AS MODE,
percentile_disc (0.5)
WITHIN
GROUP (
ORDER BY
    gap) AS median,
avg(gap::INTEGER) AS mean,
min(gap::INTEGER) AS MIN,
max(gap::INTEGER) AS MAX
FROM (
    SELECT
        extract(EPOCH
        FROM (f.c_entered_ion - s.c_entered_ion)::interval) * 1000 AS gap
FROM
    cor_entry_time s,
    cor_entry_time f
WHERE
    s.c_id + 1 = f.c_id) vals;

Still have not figured out everything that is going on above, but it 
gave me a fighting chance:)

which returned:

       gaps    |   sum      | mode | median |        mean              
             |   min       |  max
    --------+---------+------+--------+---------------------+----------+-------
      307412 | 6867802 |    0     |    461   | 22.3407088857949592   |
    -6871881 | 59791

The results should match but obviously they don’t. the ‘mode’, ‘median’, 
‘min’ and ‘max’ columns are clearly different. The ‘sum’ and ‘mean’ 
columns are similar but are also different. Only ‘gaps’ is the same. 
There should be no negative numbers at all, assuming my c_entered_ion 
column is in ascending order. Wouldn’t the 'order by c_entered_ion’ in 
the ‘WITH’ select do that?

I believe you are going to have to move the ORDER BY to outside the 
WITH. If I am following correctly:

FROM (
    SELECT
        extract(EPOCH
        FROM (f.c_entered_ion - s.c_entered_ion)::interval) * 1000 AS gap
FROM
    cor_entry_time s,
    cor_entry_time f
WHERE
    s.c_id + 1 = f.c_id
ORDER BY
    s.c__entered_ion
) vals;

You can remove the:

ORDER BY
        c_entered_ion

in the WITH.

To me, the differences between loading ‘cor_temp_gaps’ and my ‘WITH’ 
select should not cause the difference. The differences are in how 
‘c_id’ is generated and the fact that selecting only ‘c_scenario = 
‘20170628tc04’ has been moved from calculating the interval to the 
‘WITH’ select. I have also tried the ‘WITH SELECT’ approach without 
moving that test and received the same results.

My suspicion is that in the failing approach, my sequence is being 
assigned before the sort whereas when I load ‘cor_temp_gap’, and c_id 
defaults to a sequence then c_id is generated AFTER c_entered_ion is put 
in sort order.

If my suspicion is right, how do accomplish the same thing in the ‘WITH 
SELECT’ case?

If it is wrong, what am I doing wrong?

Thanks for any insights you may be able to provide!

/s/jr
Consultant
Concerto GR
Mobile: 612.208.6601

Concerto - a composition for orchestra and a soloist

--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general