On Mon, May 2, 2011 at 11:53 PM, Craig Ringer <craig@xxxxxxxxxxxxxxxxxxxxx> wrote: > On 03/05/11 11:07, Greg Smith wrote: > >> That doesn't mean you can't use >> them as a sort of foreign key indexing the data; it just means you can't >> make them the sole unique identifier for a particular entity, where that >> entity is a person, company, or part. > > Classic case: a database here has several tables indexed by MAC address. > It's used for asset reporting and software inventory. > > Problem: VMs generate random MAC addresses by default. They're not > guaranteed to be globally unique. Collisions have happened and will > probably happen again. In this case, it wasn't a big deal, but it just > goes to show that even the "obviously" globally unique isn't necessarily so. It's precisely pathological cases like this where uniqueness constraints are important and should be used. By the way, we aren't debating the use of natural case but whether to define uniqueness constraints. My main gripe with surrogates is that their use often leads directly to lazy schema design where uniqueness constraints are not defined which leads to data problems exactly like the case you described above. In a purely surrogate table with no unique on the mac, suppose you have two records with the same value for the address, and there are no other interesting fields on the table or elsewhere in the database: *) who/what made the decision to place a second record on the table? *) is that decision verifiable? *) Is that decision repeatable? *) are there other routes of data entry into the database that bypass that decision? will there ever be? *) what happens when the code that represents that decision has or develops a bug? *) why would you not want information supporting that decision in the database? *) how do you know the tables keying to your mac table are pointing to the correct record? *) what are the consequences for ad hoc queries that join directly against the mac? DISTINCT? If your data modeler that made the the assumptions that a MAC is unique (a mistake obviously) at least the other tables are protected from violations of that assumption because the database would reject them with an error, which is a Good Thing. Without a uniqueness constraint you now have ambiguous data which is a Very Bad Thing. Without proper unique constraints, a generated key is effectively saying "well, I cant' figure this out right now...I'll deal with it later". That unmanaged complexity is now in the application and all the queries that touch the database...you've created your own bug factory. With a uniqueness constraint, you have a rigorous definition of what your record represents, and other entities in the database can now rely on that definition. Natural key designs are good for a lot of reasons, but #1 on the least is that they force you to deal with problems in your data model up front because they force you to define unqiueness. If the MAC turns out not to be unique and the problem is not in fact coming from the input data or the application, yes, you do have to correct the model but at least the data inside the database is clean, and can be unambiguously mapped to the new model. I'll take schema changes over bad data. Correcting the model means you have to figure out whatever information is used to distinguish identical MACs #1 and #2 is stored in the database because now your data and the corresponding decisions are verifiable, repeatable, unambiguous, etc. What extra field you have to add to your 'mac' table to make it unique would depend on certain things, but it's certainly a solvable problem, and when solved would give you a more robust database. merlin -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general