Re: Seems like there is an issue with reltuples showing twice the number of rows

Tomas Vondra <tomas.vondra@xxxxxxxxxxxxxxx> · Tue, 3 Apr 2018 13:02:44 +0200

On 04/03/2018 11:14 AM, Ranjith Ramachandra wrote:
> I am relying on reltuples on my web app to get fast row counts.
> 
> This was recommended by this article to get fast approx row
> counts: https://wiki.postgresql.org/wiki/Count_estimate
> 
> 
> However for some table I am getting twice as many values when I try to
> do this. I did some more research and came up with this query.
> 
> select reltuples, n_live_tup, n_dead_tup
>               from pg_stat_user_tables join pg_class using (relname)
>              where relname =
> 'main_csv_0f247511f5c247c8813ba3cec90c6ae1_yf34fbb38d';
> 
> 
> it returns
> 
> 
>  reltuples  | n_live_tup | n_dead_tup
> -------------+------------+------------
> 2.7209e+06 |    1360448 |    1360448
> 
> 
> If I run analyze main_csv_0f247511f5c247c8813ba3cec90c6ae1_yf34fbb38d
> 
> and I run the same query again,
> 
>   reltuples  | n_live_tup | n_dead_tup
> -------------+------------+------------
>  1.36045e+06 |    1360448 |    1360448
> 
> But after some time the value goes back to being double the value. This
> is causing me a lot of problems since this inaccuracy does not make any
> sense to me.
> 

There was a difference between VACUUM and ANALYZE in handling recently
dead rows (essentially deleted rows that can't be removed yet), causing
similar changes to reltuples. Essentially if you do VACUUM and ANALYZE,
it may set reltuples to rather different estimates. That is fixed now
and should be in the next minor release.

Hard to say if this is a case of that, but judging by the number of dead
tuples chances are it is.

It's probably better to use n_live_tup instead, though. I'd say that's
closer to the "live tuples" definition.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services