Re: Using 'WITH SELECT' Results Do Not Match 'SELECT FROM <table>' Results

Jerry Regan <jerry.regan@xxxxxxxxxxxxxxxxxxxxxxxxxxx> · Mon, 3 Jul 2017 19:20:02 -0500

Adrian,

Thank you for your reply!

I apologize in advance for not being detailed below. Hard to do from my phone. 

I did have to move the 'ORDER BY', but not outside the 'WITH'. My first workaround parenthesized the select containing the 'ORDER BY', forcing it to be evaluated before the 'INSERT'. That worked.

But I never liked using a sequence for the c_id column. And using the sequence on my personal workstation was maybe safe, but given that sequences not are guaranteed to be without gaps, that was not very portable.

So I searched a bit and found I could use 'row_number()' instead. That approach allowed me to use the 'ORDER BY' required by 'row_number()'.

That worked and is far more portable to other postgresql instances.

I really do appreciate your response. It is also my nature to continue my research even after asking for help. However I find my answer, one validates the other. 

Thanks again!

/s/jr
Sent from my iPhone

> On Jul 3, 2017, at 18:21, Adrian Klaver <adrian.klaver@xxxxxxxxxxx> wrote:
> 
>> On 07/02/2017 10:33 AM, Jerry Regan wrote:
>> For reasons beyond my control, I am using Postgresql 9.4 on a MacBookPro (development system). I use pgadminIII and psql for clients (I tried and didn’t like the pgadmin4 UI; pg_dump, pg_restore also seem to be dumbed down).
>> My question:
>> I have some performance test results in table cor_duration_report. One column, c_entered_ion is of type timestamptz, another, c_scenario, is of type text. I want to calculate the difference between succeeding c_entered_ion rows to learn the rate at which entry events occur. In cor_duration_report, c_entered_ion columns are NOT in ascending sort order.
>> For a first attempt, I created another table cor_temp_gap as:
>>    CREATE TABLE cor_temp_gap
>>    (
>>       c_id serial NOT NULL,
>>       c_entered_ion timestamp with time zone NOT NULL,
>>       c_scenario text NOT NULL
>>    )
>>    WITH (
>>       OIDS=FALSE
>>    );
>> and loaded it with:
>>    INSERT into cor_temp_gap (c_entered_ion, c_scenario) SELECT
>>    c_entered_ion, c_scenario from cor_duration_report order by
>>    c_entered_ion;
>> The c_id column is loaded with the default value - the next sequence value.
>> I then generated my report with:
>>    select count( gap ) as gaps, sum(gap) as sum,
>>            mode() within group (order by gap) as mode,
>>            percentile_disc(0.5) within group (order by gap) as  median,
>>            avg( gap::integer ) as mean,
>>            min( gap ) as min,
>>            max( gap ) as max
>>    from ( select extract( epoch from ( f.c_entered_ion -
>>    s.c_entered_ion)::interval) * 1000 as gap
>>            from cor_temp_gap s, cor_temp_gap f
>>            where s.c_scenario = '20170628tc04'
>>            and s.c_id+1 = f.c_id ) vals;
>> This seems to give me the results I want:
>>      gaps     |   sum      | mode    | median |         mean                            | min |  max
>>    ------+---------+------+--------+-----------------------+-----+--------
>>      307412 | 6872207 |    1       |      8      | 22.3550381897908995
>>        |   0 | 10846
>> The min value of zero is accurate. The mode value of 1 is reasonable, as is the median value of 8. Using a totally different method, the mean value is accurate, as is gaps (there are 307,413 rows in the table).
>> I do know enough sql to believe my cor_temp_gap table could probably be replace by a ‘WITH SELECT….’
>> I attempted this:
>>    with cor_entry_time as ( select nextval('cor_temp_select_c_id_seq')
>>    as c_id, c_entered_ion, c_scenario
>>    from cor_duration_report where c_scenario = '20170628tc04' order by
>>    c_entered_ion )
>>    select count( gap ) as gaps,
>>            sum(gap::integer) as sum,
>>            mode() within group (order by gap) as mode,
>>            percentile_disc(0.5) within group (order by gap) as  median,
>>            avg( gap::integer ) as mean,
>>            min( gap::integer ) as min,
>>            max( gap::integer ) as max
>>    from ( select extract( epoch from ( f.c_entered_ion -
>>    s.c_entered_ion)::interval) * 1000 as gap
>>            from cor_entry_time s, cor_entry_time f
>>            where s.c_id+1 = f.c_id ) vals;
> 
> I used this site to reformat the above:
> 
> http://sqlformat.darold.net/
> 
> WITH cor_entry_time AS (
>    SELECT
>        nextval('cor_temp_select_c_id_seq') AS c_id,
>        c_entered_ion,
>        c_scenario
>    FROM
>        cor_duration_report
>    WHERE
>        c_scenario = '20170628tc04'
>    ORDER BY
>        c_entered_ion
> )
> SELECT
>    count(gap) AS gaps,
>    sum(gap::INTEGER) AS SUM,
>    MODE ()
>    WITHIN
> GROUP (
> ORDER BY
>    gap) AS MODE,
> percentile_disc (0.5)
> WITHIN
> GROUP (
> ORDER BY
>    gap) AS median,
> avg(gap::INTEGER) AS mean,
> min(gap::INTEGER) AS MIN,
> max(gap::INTEGER) AS MAX
> FROM (
>    SELECT
>        extract(EPOCH
>        FROM (f.c_entered_ion - s.c_entered_ion)::interval) * 1000 AS gap
> FROM
>    cor_entry_time s,
>    cor_entry_time f
> WHERE
>    s.c_id + 1 = f.c_id) vals;
> 
> 
> Still have not figured out everything that is going on above, but it gave me a fighting chance:)
>    
> 
> 
>> which returned:
>>       gaps    |   sum      | mode | median |        mean                           |   min       |  max
>>    --------+---------+------+--------+---------------------+----------+-------
>>      307412 | 6867802 |    0     |    461   | 22.3407088857949592   |
>>    -6871881 | 59791
>> The results should match but obviously they don’t. the ‘mode’, ‘median’, ‘min’ and ‘max’ columns are clearly different. The ‘sum’ and ‘mean’ columns are similar but are also different. Only ‘gaps’ is the same. There should be no negative numbers at all, assuming my c_entered_ion column is in ascending order. Wouldn’t the 'order by c_entered_ion’ in the ‘WITH’ select do that?
> 
> I believe you are going to have to move the ORDER BY to outside the WITH. If I am following correctly:
> 
> FROM (
>    SELECT
>        extract(EPOCH
>        FROM (f.c_entered_ion - s.c_entered_ion)::interval) * 1000 AS gap
> FROM
>    cor_entry_time s,
>    cor_entry_time f
> WHERE
>    s.c_id + 1 = f.c_id
> ORDER BY
>    s.c__entered_ion
> ) vals;
> 
> You can remove the:
> 
> ORDER BY
>        c_entered_ion
> 
> in the WITH.
> 
>> To me, the differences between loading ‘cor_temp_gaps’ and my ‘WITH’ select should not cause the difference. The differences are in how ‘c_id’ is generated and the fact that selecting only ‘c_scenario = ‘20170628tc04’ has been moved from calculating the interval to the ‘WITH’ select. I have also tried the ‘WITH SELECT’ approach without moving that test and received the same results.
>> My suspicion is that in the failing approach, my sequence is being assigned before the sort whereas when I load ‘cor_temp_gap’, and c_id defaults to a sequence then c_id is generated AFTER c_entered_ion is put in sort order.
>> If my suspicion is right, how do accomplish the same thing in the ‘WITH SELECT’ case?
>> If it is wrong, what am I doing wrong?
>> Thanks for any insights you may be able to provide!
>> /s/jr
>> Consultant
>> Concerto GR
>> Mobile: 612.208.6601
>> Concerto - a composition for orchestra and a soloist
> 
> 
> -- 
> Adrian Klaver
> adrian.klaver@xxxxxxxxxxx

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general