Hello,
I had the same issue before and I used the PostgreSQL statistics to see weather the tables are used or not. One thing that I could not solve is how to check if the schema design and semantics are good. i.e. table a references table b, table c references table b, and table c reference table a. In some cases I
find something like loops and circles in other cases, I find the same table referenced many times in other tables in the same schema. Any way, here are my findings regarding how to clean up your data.
1. Checking the number of sequential and indexed access to the table gives a good hint if the table is in use or deprecated. The following select statement retrieve the tables that might be deprecated.
Select relname from pg_stat_user_tables WHERE (idx_tup_fetch + seq_tup_read)= 0; -- you can define a threshold here
2. Empty tables can be retrieved by checking the number of live tup i.e
Select relname from pg_stat_user_tables WHERE n_live_tup = 0;
3. column can be checked using the null fraction in
see http://www.postgresql.org/docs/8.3/static/view-pg-stats.html4. use pg_constraints to determine the tables that depends on the above tables
5. table duplicates i.e the table can be found in more than one schema
SELECT n.nspname as "Schema", c.relname as "Name" FROM pg_catalog.pg_class c LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace WHERE c.relname IN (SELECT relname FROM pg_catalog.pg_class WHERE relkind IN ('r') GROUP BY relname Having count(relname) > 1) ORDER BY 2,1;
6. For views there is no statistics, an easy way is to parse the log file using regular expressions and shell scripting and compare the result
with the list of views and tables , I did that and I get many deprecated view.
7. for duplicate data have a look on this query.
-- if you
need table to find an exact duplicate replace <col1 >, ... <coln > with table name
SELECT
<col1 >, ... <coln >,
min(ctid) AS keep,
count(*)
FROM <table>
GROUP BY <col1 >, ... <coln >
HAVING count(*) > 1
--- The above code snippet can be combined with delete statement to delete duplicate rows
Have fun
From: Jason Long <mailing.lists@xxxxxxxxxxxxxxxx>
To: Guillaume Lelarge <guillaume@xxxxxxxxxxxx>
Cc: Craig Ringer <ringerc@xxxxxxxxxxxxx>; pgsql-general@xxxxxxxxxxxxxx
Sent: Friday, September 30, 2011 12:12 AM
Subject: Re: Identifying old/unused views and table
On Wed, 2011-09-28 at 08:52 +0200, Guillaume Lelarge wrote:
> On Wed, 2011-09-28 at 09:04 +0800, Craig Ringer wrote:
> > On 09/28/2011 04:51 AM, Jason Long wrote:
> > > I have an application with a couple hundred views and a couple hundred
> > > tables.
> > >
> > > Is there some way I can find out which views have been accessed in the
> > > last 6 months or so? Or some way to log this?
> > >
> > > I know there are views and tables that are no longer in used by my
> > > application and I am looking for a way to identify them.
> >
> > Look at the pg_catalog.pg_stat* tables
> >
>
> I fail to see how that gives him any answer on the views, and tables no
> longer used. AFAICT, there's no way to know for views (apart from
> logging all queries in the log). As for tables, still apart from the
> log, pg_stat_user_tables could give an answer if he was monitoring it at
> least the last six months.
>
>
Thanks for the replies. Views were my main problem. My application
could use some cleanup. Doing is manually is probably the best
approach. I was just looking for a jump start.
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
To: Guillaume Lelarge <guillaume@xxxxxxxxxxxx>
Cc: Craig Ringer <ringerc@xxxxxxxxxxxxx>; pgsql-general@xxxxxxxxxxxxxx
Sent: Friday, September 30, 2011 12:12 AM
Subject: Re: Identifying old/unused views and table
On Wed, 2011-09-28 at 08:52 +0200, Guillaume Lelarge wrote:
> On Wed, 2011-09-28 at 09:04 +0800, Craig Ringer wrote:
> > On 09/28/2011 04:51 AM, Jason Long wrote:
> > > I have an application with a couple hundred views and a couple hundred
> > > tables.
> > >
> > > Is there some way I can find out which views have been accessed in the
> > > last 6 months or so? Or some way to log this?
> > >
> > > I know there are views and tables that are no longer in used by my
> > > application and I am looking for a way to identify them.
> >
> > Look at the pg_catalog.pg_stat* tables
> >
>
> I fail to see how that gives him any answer on the views, and tables no
> longer used. AFAICT, there's no way to know for views (apart from
> logging all queries in the log). As for tables, still apart from the
> log, pg_stat_user_tables could give an answer if he was monitoring it at
> least the last six months.
>
>
Thanks for the replies. Views were my main problem. My application
could use some cleanup. Doing is manually is probably the best
approach. I was just looking for a jump start.
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general