Peter Devoy <peter@xxxxxxxxx> writes: > Hi list > > I need to store addresses for properties (as in real estate) so in my > naivety I created a unique constraint like this: > > ALTER TABLE properties > ADD CONSTRAINT is_unique_address > UNIQUE ( > description, --e.g. Land north of Foo Cottage > address_identifier_general, > street, > postcode > ); > > Of course, if any of the fields are NULL (which they often are) I end > up with duplicates. > > One solution may be to add NOT NULL constraints and use empty strings > instead of NULL values but, until asking around today, I thought this was > generally considered bad practice. > > Please can anyone recommend a way of approaching this? Perhaps empty strings > are pragmatic in this situation? > > Kind regards > > Hi Peter, Personally, I don't like the idea of using empty strings just to avoid having nulls. This is probably a personal preference, but for me null and '' are quite different. A null indicates an unknown - we don't know what the value is. An empty string i.e. '' means there is no value (i.e. we know it has no value). The difference is quite subtle and may not seem relevant. It may not be or it may be or it may become relevant in the future. General rule of thumb for me is that my model should reflect the known information and should always avoid any data transformation or mapping which reduces the known information. I would step back a bit and think about why/what constraint you really need and what needs to be unique. The first field which jumps out for me is description. Is this really a unique value? Would it be possible to have two properties with the same description? Does it matter if two properties have the same description? Does the description really affect property uniqueness. If two records have the same street, postcode and general_property_identifier, but different descriptions, are they really two different records? Will description change over time? As description is a fairly subjective value, I would be tempted to not include it in your unique constraint at all. In fact, I would probably keep description in a separate table as it may be reasonable to have multiple descriptions for a property. If you want just a single description, then you can leave it in this table. I would not put a unique or not null constraint on it. This would leave you with address_identifier_general, street and postcode. None of those will be unique by themselves. You will only get uniqueness when you combine all 3. Can any of them be null? I would suspect not, so I would define them with not null constraints. I would then probably add a composite unique index using all 3 values to enforce uniqueness. Depending on your application needs, I would probably add a unique property_id field to the table as well (which would be the value I would used to link records in other tables, such as a property_description table). Of course, there are other constraints you could consider - post code probably has a set format which you might want to enforce or perhaps you can access a complete listing of valid postcodes and import that into your system as a postcode table. In that case, your postcode field might be better defined as a foreign key constraint into the postcode table. When defining your constraints, it is important to consider what information is known at the point of initial data entry. Is there a need to enter partial data (for example, you might know the street and postcode, but not the general_property_identifier. Is it expected or reasonable to allow entry of this sort of partial data? If so, how will that work with your uniqueness constraints? (it may be quite reasonable to require all 3 fields be known). The point is, you need to know how the system will be used and what the expectations of the users are. Maybe there is a legitimate business case to allow partial data entry, in which case, you may need a different approach or a way to identify partial/incomplete records etc. -- Tim Cross