Re: DB design advice: lots of small tables?

Jasen Betts <jasen@xxxxxxxxxx> · 16 Mar 2013 06:30:13 GMT

On 2013-03-15, lender <crlender@xxxxxxxxx> wrote:
> Hello.
>
> We are currently redesigning a medium/large office management web
> application. There are 75 tables in our existing PostgreSQL database,
> but that number is artificially low, due to some unfortunate design choices.
>
> The main culprits are two tables named "catalog" and "catalog_entries".
> They contain all those data sets that the previous designer deemed too
> small for a separate table, so now they are all stored together. The
> values in catalog_entries are typically used to populate dropdown select
> fields.

> So, my first main question would be: is it "normal" or desirable to have
> that many tiny tables? And is it a problem that many of the tables have
> the same (or a similar) column definitions?

Dunno about "normal", but certainly "Normal" (as in "-form").
No problem.

> The second point is that we have redundant unique identifiers in
> catalog_entries (id and code). The code value is used by the application
> whenever we need to find to one of the values. For example, for a query
> like "show all open invoices", we would either -
>
>   1) select the id from catalog_entries where catalog_id refers to the
>      "invoice_status" catalog and the code is "open"
>   2) use that id to filter select * from invoices
>
> - or do the same in one query using joins. This pattern occurs hundreds
> of times in the application code. From a programming viewpoint, having
> all-text ids would make things a lot simpler and cleaner (i.e., keep
> only the "code" column).
>
> The "id" column was used (AFAIK) to reduce the storage size. Most of the
> data tables have less than 100k records, so the overhead wouldn't be too
> dramatic, but a few tables (~10) have more; one of them has 1.2m
> records. These tables can also refer to the old catalog_entries table
> from more than one column. Changing all these references from INT to
> VARCHAR would increase the DB size, and probably make scans less
> performant. I'm not sure know how indexes on these columns would be
> affected.
>
> To summarize, the second question is whether we should ditch the
> artificial numeric IDs and just use the "code" column as primary key in
> the new tiny tables.

I if they aren't hurting you keep them.

> Thanks in advance for your advice.

If you're worried about clutter It may make sense to put all the small tables
in a separate schema.

-- 
⚂⚃ 100% natural

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general