I am wondering the feasibility of having PG continue to work even if
non-essential indexes are gone or corrupt. I brought this basic concept
up at some point in the past, but now I have a different motivation, so
I want to strike up discussion about it again. This time around, I
simply don't want to back up indexes if I don't have to. Because
indexes contain essentially redundant data, losing one does not equate
to losing real data. Therefore, backing them up represents a lot of
overhead for very little benefit.
Here's the basic idea:
1) New field to pg_index (indvalid boolean).
2) Query planner skips indexes where indvalid = false.
3) Executer does not update indexes where indvalid = false.
4) Executer refuses insert or update to unique columns where indvalid =
false, throwing an error.
5) WAL roll forward marks indvalid = false if index file(s) are missing,
rather than panicking.
6) REINDEX recognizes syntax to only build indexes with indvalid =
false, marks indvalid = true.
Close to 25% of the on disk bulk of my database is index files. It
would save a significant amount of the system resources used during the
backup, if I didn't have to archive the index files. In the unlikely
event that a restore/roll forward becomes necessary, I could simply
issue something like "REINDEX DATABASE foo INVALID;" to restore all the
missing indexes and return the database to full function. Prior to a
reindex, the database would perform poorly and refuse to do certain
inserts and updates, but the data would be available. Backup files
would be smaller, and the restore/roll forward would be faster.
No down sides jump out at me, and it seems to me that for a regular PG
code hacker this could actually be fairly simple to implement.
Any chance of something like this being done in the future?
-Glen
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general