Re: Enforcing uniqueness on [real estate/postal] addresses

Tim Cross <theophilusx@xxxxxxxxx> · Tue, 12 May 2020 10:49:22 +1000

Peter Devoy <peter@xxxxxxxxx> writes:

> Hi list
>
> I need to store addresses for properties (as in real estate) so in my
> naivety I created a unique constraint like this:
>
> ALTER TABLE properties
>     ADD CONSTRAINT is_unique_address
>     UNIQUE (
>         description, --e.g. Land north of Foo Cottage
>         address_identifier_general,
>         street,
>         postcode
>     );
>
> Of course, if any of the fields are NULL (which they often are) I end
> up with duplicates.
>
> One solution may be to add NOT NULL constraints and use empty strings
> instead of NULL values but, until asking around today, I thought this was
> generally considered bad practice.
>
> Please can anyone recommend a way of approaching this? Perhaps empty strings
> are pragmatic in this situation?
>
> Kind regards
>
>

Hi Peter,

Personally, I don't like the idea of using empty strings just to avoid
having nulls. This is probably a personal preference, but for me null
and '' are quite different. A null indicates an unknown - we don't know
what the value is. An empty string i.e. '' means there is no value (i.e.
we know it has no value). The difference is quite subtle and may not
seem relevant. It may not be or it may be or it may become relevant in
the future. General rule of thumb for me is that my model should reflect
the known information and should always avoid any data transformation or
mapping which reduces the known information.

I would step back a bit and think about why/what constraint you really
need and what needs to be unique. The first field which jumps out for me
is description. Is this really a unique value? Would it be possible to
have two properties with the same description? Does it matter if two
properties have the same description? Does the description really affect
property uniqueness. If two records have the same street, postcode and
general_property_identifier, but different descriptions, are they really
two different records? Will description change over time? As description
is a fairly subjective value, I would be tempted to not include it in
your unique constraint at all. In fact, I would probably keep
description in a separate table as it may be reasonable to have multiple
descriptions for a property. If you want just a single description, then
you can leave it in this table. I would not put a unique or not null
constraint on it.

This would leave you with address_identifier_general, street and
postcode. None of those will be unique by themselves. You will only get
uniqueness when you combine all 3. Can any of them be null? I would
suspect not, so I would define them with not null constraints. I would
then probably add a composite unique index using all 3 values to enforce
uniqueness. Depending on your application needs, I would probably add a
unique property_id field to the table as well (which would be the value
I would used to link records in other tables, such as a
property_description table).

Of course, there are other constraints you could consider - post code
probably has a set format which you might want to enforce or perhaps you
can access a complete listing of valid postcodes and import that into
your system as a postcode table. In that case, your postcode field might
be better defined as a foreign key constraint into the postcode table. 

When defining your constraints, it is important to consider what
information is known at the point of initial data entry. Is there a need
to enter partial data (for example, you might know the street and
postcode, but not the general_property_identifier. Is it expected or
reasonable to allow entry of this sort of partial data? If so, how will
that work with your uniqueness constraints? (it may be quite reasonable
to require all 3 fields be known). The point is, you need to know how
the system will be used and what the expectations of the users are.
Maybe there is a legitimate business case to allow partial data entry,
in which case, you may need a different approach or a way to identify
partial/incomplete records etc. 

-- 
Tim Cross