Hello. We are currently redesigning a medium/large office management web application. There are 75 tables in our existing PostgreSQL database, but that number is artificially low, due to some unfortunate design choices. The main culprits are two tables named "catalog" and "catalog_entries". They contain all those data sets that the previous designer deemed too small for a separate table, so now they are all stored together. The values in catalog_entries are typically used to populate dropdown select fields. The catalog table (simplified): id SERIAL -- artificial record ID code VARCHAR -- unique text identifier description VARCHAR -- (info only, otherwise unused) The catalog_entries table (simplified): id SERIAL -- artificial record ID catalog_id INTEGER -- parent catalog ID code VARCHAR -- unique (per catalog) text identifier rank INTEGER -- used for sorting the values text_short VARCHAR -- text for display (short version) text_long TEXT -- text for display (long version) Here are some examples of what the tables contain: Catalog: department Entries: it, sales, accounting, cases, ... Catalog: printers Entries: ma_color, pa_color, pa_black, pdf, ... Catalog: invoice_status Entries: open, locked, entered, booked, cancelled, ... Catalog: coverage Entries: national, regional, international, obsolete Records in other tables reference the values in catalog_entries by id. For example, the "invoices" table has a "status_id" column pointing to a record in catalog_entries. Of course, this leads to possible integrity issues (there is nothing to prevent an invoice record referencing the "ma_color" value instead of "open" in its status_id field). There are 64 "catalogs" (data sets), in addition to the 75 real tables. Now we have finally got the go-ahead to refactor this mess. Most of the old "catalogs" will probably end up as separate tables. Others might be replaced with ENUMs or booleans, especially the ones with only 2-3 values. The reason why I'm hesitating and asking for advice now, is that after refactoring, we'll end up with ~60 new tables, all of them rather small and with practically identical columns. (Only five catalogs have more than 20 entries; about half have five entries or less; five catalogs have only two entries) So, my first main question would be: is it "normal" or desirable to have that many tiny tables? And is it a problem that many of the tables have the same (or a similar) column definitions? The second point is that we have redundant unique identifiers in catalog_entries (id and code). The code value is used by the application whenever we need to find to one of the values. For example, for a query like "show all open invoices", we would either - 1) select the id from catalog_entries where catalog_id refers to the "invoice_status" catalog and the code is "open" 2) use that id to filter select * from invoices - or do the same in one query using joins. This pattern occurs hundreds of times in the application code. From a programming viewpoint, having all-text ids would make things a lot simpler and cleaner (i.e., keep only the "code" column). The "id" column was used (AFAIK) to reduce the storage size. Most of the data tables have less than 100k records, so the overhead wouldn't be too dramatic, but a few tables (~10) have more; one of them has 1.2m records. These tables can also refer to the old catalog_entries table from more than one column. Changing all these references from INT to VARCHAR would increase the DB size, and probably make scans less performant. I'm not sure know how indexes on these columns would be affected. To summarize, the second question is whether we should ditch the artificial numeric IDs and just use the "code" column as primary key in the new tiny tables. Thanks in advance for your advice. crl -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general