On 15 August 2015 at 02:32, David Nelson <dlnelson77808@xxxxxxxxxxx> wrote:
Hello list,<br><br>Apologies if this has been asked before. My search only turned up ways to list the total non-null values for all columns as a single number. I want the count for each column by column.<br><br>I have inherited a database consisting of two related huge monolithic tables that lack referential integrity between them, or even basic data constraints. One of the problems these tables have is every single non-PK column is NULLable. I am trying to understand the information that is actually stored and used so that I can implement a (hopefully) better design. Towards that end I would like to know the count of non-null values in each column per column. In other words I would like to get the following output from a table (the numbers are totally made up):<br><br>column_name | num_values<br>------------+-----------<br>col1 | 5787<br>------------+-----------<br>col2 | 17<br>------------+-----------<br>col3 | 567<br>------------+-----------<br>col4 | 5787<br>------------+-----------<br>col5 | 143<br>------------+-----------<br>col6 | 1<br>------------+-----------<br>...<br><br>Is this possible through one or more of the system views, or will I need to write a function to do this? Obviously I can just issue multiple SELECT COUNT(column)... statements, but I'd rather not.<br><br>Thanks,<br>David
I didn't quite catch a Postgres version number in that mess :)
I assume the tables are quite large if you don't want to just issue a: SELECT COUNT(a) FILTER(WHERE a IS NULL),COUNT(b) FILTER(WHERE b IS NULL) ... (assuming you're on a version new enough to support agg FILTER)
On the other hand if you were happy with just an approximation then you could look at pg_stats;
create table abc(a int, b int,c int);
insert into abc values(1, 1, NULL),(1, NULL, NULL),(NULL, NULL, NULL);
analyze abc;
select attname, null_frac from pg_stats where tablename = 'abc';
attname | null_frac
---------+-----------
a | 0.333333
b | 0.666667
c | 1
Keep in mind though that this is an *approximation* and possibly could be inaccurate. If you want exact do the COUNT(col).
If you're not on a version new enough for COUNT(col) FILTER WHERE ..., you could just SUM(CASE WHEN col IS NULL THEN 1 ELSE 0 END)
Regards
David Rowley
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services