Re: DB design advice: lots of small tables?

lender <crlender@xxxxxxxxx> · Fri, 15 Mar 2013 17:09:59 +0100

Thanks everybody for your comments.

On 2013-03-15 16:03, François Beausoleil wrote:
> Keeping a meaningless ID is not a problem in and of itself. It makes
> it easier to edit records from the UI, since you can reference the ID in
> the UPDATE and DELETE statements, without fear of colliding with
> anything else. It's not so much a problem on small lookup tables, but on
> larger entities (people, companies, etc), referencing through the ID is
> much, much easier.

I'm not so much concerned with giving records in a large data set
surrogate IDs. There is often no good candidate for a natural key, or
the candidates aren't static enough.

The small tables I mentioned earlier all have clear and very static keys
(the "code" column). I'm only concerned about the practical consequences
of introducing strings where there used to be integers.

To give a practical example, this is how it looks with artificial IDs:

Table documents:
  id    | name        | type_id
 -------+-------------+----------
  62307 | Example.odt | 413

Table document_types:
  id  | code                             | text_short
 -----+----------------------------------+---------------------------
  413 | information_disclosure_statement | Information Disclosure [...]

Using the natural key, it would look like this:

Table documents:
  id | name         | type
 ----+--------------+----------------------------------
  23 | Example.odt  | information_disclosure_statement

Table document_types:
  code                             | text_short
 ----------------------------------+---------------------------
  information_disclosure_statement | Information Disclosure [...]

(admittedly, "information_disclosure_statement" is one of the longer
codes we use. The average is about 14 characters, the longest is 38
characters)

Now, what if we have hundreds of thousands of records in the "documents"
table? Apart from the increased storage requirements, will scanning the
table take noticably longer? Will the indexes suffer? Will creating,
updating, importing, dumping, restoring etc take (much) longer?
Comparing two integers is computationally less expensive than comparing
two variable-length strings, of course, but I have no empirical notion
of how much of a performance hit to expect.

I know that these questions cannot be answered with any accuracy without
knowing all the details, but that's the type of thing that has me a
little worried right now. I'm fine with getting a little less
performance; that should be mostly offset by the other changes and
improvements we're making. I just don't want to introduce a fundamental
mistake at this stage.

Thanks again,
crl

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general