Search Postgresql Archives

Re: Count of non-null values per table column

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 15 August 2015 at 02:32, David Nelson <dlnelson77808@xxxxxxxxxxx> wrote:
Hello list,<br><br>Apologies if this has been asked before. My search only turned up ways to list the total non-null values for all columns as a single number. I want the count for each column by column.<br><br>I have inherited a database consisting of two related huge monolithic tables that lack referential integrity between them, or even basic data constraints. One of the problems these tables have is every single non-PK column is NULLable. I am trying to understand the information that is actually stored and used so that I can implement a (hopefully) better design. Towards that end I would like to know the count of non-null values in each column per column. In other words I would like to get the following output from a table (the numbers are totally made up):<br><br>column_name | num_values<br>------------+-----------<br>col1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5787<br>------------+-----------<br>col2&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 17<br>------------+-----------<br>col3&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 567<br>------------+-----------<br>col4&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5787<br>------------+-----------<br>col5&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 143<br>------------+-----------<br>col6&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1<br>------------+-----------<br>...<br><br>Is this possible through one or more of the system views, or will I need to write a function to do this? Obviously I can just issue multiple SELECT COUNT(column)... statements, but I'd rather not.<br><br>Thanks,<br>David


I didn't quite catch a Postgres version number in that mess :)

I assume the tables are quite large if you don't want to just issue a: SELECT COUNT(a) FILTER(WHERE a IS NULL),COUNT(b) FILTER(WHERE b IS NULL) ... (assuming you're on a version new enough to support agg FILTER) 

On the other hand if you were happy with just an approximation then you could look at pg_stats;

create table abc(a int, b int,c int);
insert into abc values(1, 1, NULL),(1, NULL, NULL),(NULL, NULL, NULL);
analyze abc;

select attname, null_frac from pg_stats where tablename = 'abc';

 attname | null_frac
---------+-----------
 a       |  0.333333
 b       |  0.666667
 c       |         1

Keep in mind though that this is an *approximation* and possibly could be inaccurate. If you want exact do the COUNT(col).

If you're not on a version new enough for COUNT(col) FILTER WHERE ..., you could just SUM(CASE WHEN col IS NULL THEN 1 ELSE 0 END)

Regards

David Rowley

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
 

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux