Search Postgresql Archives

Re: Count of records in a row

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ingenious!

I actually think, however, there was a subtle bug in, though I see you fixed it.

The line:
              - row_number() over () as d
needs to be:
              - row_number() over (order by i asc) as d

I discovered this when working your code into my application.  I got
very, very weird results - with one order of columns in the select, I
got the correct answer, but with another one I didn't.  After much
debugging, I realized that the original version ("- row_number over
()") wasn't defined! So, depending on how I wrote the select
statement, Postgres could pick different orders!

But I see your cleaned up version already fixed this!

On 10/25/13, Elliot <yields.falsehood@xxxxxxxxx> wrote:
> Glad I could help. It's easier to understand if you break apart the CTE.
> I'm also moving around the order by i to clean this up a little. Sorry
> for the formatting.
>
> Running this:
>     select i,
>            val,
>            row_number() over (partition by val order by i asc) as class_i,
>            row_number() over (order by i asc) as overall_i,
>            row_number() over (partition by val order by i asc)
>              - row_number() over () as d
>     from data
>
> Yields this:
> i    val    class_i    overall_i    d
> 1    A    1    1    0
> 2    A    2    2    0
> 3    A    3    3    0
> 4    B    1    4    -3
> 5    C    1    5    -4
> 6    A    4    6    -2
> 7    D    1    7    -6
> 8    A    5    8    -3
> 9    A    6    9    -3
> 10    D    2    10    -8
> 11    D    3    11    -8
> 12    B    2    12    -10
> 13    C    2    13    -11
> 14    C    3    14    -11
>
> class_i counts the row number within a class and overall_i counts the
> overall row number in the sequence. Here's just one class extracted to
> emphasize that:
>
> i    val    class_i    overall_i    d
> 1    A    1    1    0
> 2    A    2    2    0
> 3    A    3    3    0
> 6    A    4    6    -2
> 8    A    5    8    -3
> 9    A    6    9    -3
>
> Within a given consecutive run of a particular class the difference
> between class_i and overall_i will always be the same (because they're
> both increasing by the same amount) but that difference between runs
> will always be different (because each run starts the sequences at
> different offsets). "d" is the difference of the two. Because that value
> segments the runs, all that needs to be done is group by it and count
> the items in the group to get the length of the runs.
>
> The xxx column was junk left over from copying and pasting and
> verifying. Apologies :). This is a cleaned up version:
>
> with x
> as
> (
>    select i,
>           val,
>           row_number() over (partition by val order by i asc)
>             - row_number() over (order by i asc) as d
>    from data
> )
> select val,
>         count(*)
> from x
> group by d,
>           val
> order by min(i)
> ;
>
>


-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux