Thanks everybody for your comments. On 2013-03-15 16:03, François Beausoleil wrote: > Keeping a meaningless ID is not a problem in and of itself. It makes > it easier to edit records from the UI, since you can reference the ID in > the UPDATE and DELETE statements, without fear of colliding with > anything else. It's not so much a problem on small lookup tables, but on > larger entities (people, companies, etc), referencing through the ID is > much, much easier. I'm not so much concerned with giving records in a large data set surrogate IDs. There is often no good candidate for a natural key, or the candidates aren't static enough. The small tables I mentioned earlier all have clear and very static keys (the "code" column). I'm only concerned about the practical consequences of introducing strings where there used to be integers. To give a practical example, this is how it looks with artificial IDs: Table documents: id | name | type_id -------+-------------+---------- 62307 | Example.odt | 413 Table document_types: id | code | text_short -----+----------------------------------+--------------------------- 413 | information_disclosure_statement | Information Disclosure [...] Using the natural key, it would look like this: Table documents: id | name | type ----+--------------+---------------------------------- 23 | Example.odt | information_disclosure_statement Table document_types: code | text_short ----------------------------------+--------------------------- information_disclosure_statement | Information Disclosure [...] (admittedly, "information_disclosure_statement" is one of the longer codes we use. The average is about 14 characters, the longest is 38 characters) Now, what if we have hundreds of thousands of records in the "documents" table? Apart from the increased storage requirements, will scanning the table take noticably longer? Will the indexes suffer? Will creating, updating, importing, dumping, restoring etc take (much) longer? Comparing two integers is computationally less expensive than comparing two variable-length strings, of course, but I have no empirical notion of how much of a performance hit to expect. I know that these questions cannot be answered with any accuracy without knowing all the details, but that's the type of thing that has me a little worried right now. I'm fine with getting a little less performance; that should be mostly offset by the other changes and improvements we're making. I just don't want to introduce a fundamental mistake at this stage. Thanks again, crl -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general