Re: Database design - best practice

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Nov 28, 2012 at 4:41 AM, Niels Kristian Schjødt
<nielskristian@xxxxxxxxxxxxx> wrote:
>
> So my main concern is actually about the cars table, since this one currently has a lot of columns (151 - I expect thats quite a lot?), and a lot of data (4 mil. rows, and growing). Now you might start by thinking, this could sound like a regular need for some normalization, but wait a second and let me explain :-)

If you have 151 single-valued pieces of information, than that is what
you have.  You can't tell if something is normalized or not by
counting the columns.

> The columns in this table is for the most very short stings, integers, decimals or booleans. So take for an example has_automatic_transmission (boolean) I can't see why it would make sense to put that into a separate table and join in the values.

I can't see why that would make sense, either.  Nor do I think that
doing so would increase the level of normalization.  What rule of
normalization would be served by creating gratuitous joins?

> Or the milage or the price as another example. The cars table used for search is indexed quite a lot.

How useful are the indices?

> The questions:
> Having the above setup in mind, what impact on performance, in terms of read performance and write performance, does it have, whether I do the following:
>         1) In general would the read and/or the write on the database be faster, if I serialized some of the not searched columns in the table into a single text columns instead of let's say 20 booleans?

Probably not. And could make it much worse, depending on how you
serialize it.  For example, if you use hstore or json, now the "column
names" for each of the 20 booleans are repeated in every row, rather
than being metadata stored only once.  But try it and see.

>         2) Lets say I'm updating a timestamp in a single one of the 151 columns in the cars table. The update statement is using the id to find the car. Would the write performance of that UPDATE be affected, if the table had fewer columns?

Yes, but probably not by much.  The biggest effect will be on whether
the timestamp column is indexed.  If it is, then updating it means
that all other indexes on the table will also need to be updated.  If
it is not indexed, then the update can be a HOT update.


>         3) When adding a new column to the table i know that it becomes slower the more rows is in the table, but what about the "width" of the table does that affect the performance when adding new columns?

Adding a new column to a table is pretty much instantaneous if the
default value is NULL.

>         4) In general what performance downsides do you get when adding a lot of columns to one table instead of having them in separate tables?

This question cannot be answered in general.  If every time you use
the main table you have to join it to the separate table, then
performance will be bad.  If you almost never have to join to the
separate table, then performance will be better.

>         5) Is it significantly faster to select * from a table with 20 columns, than selecting the same 20 in a table with 150 columns?

If the extra 130 columns are mostly null, the difference will be very
small.  Or, if the where clause is such that you only do a single-row
lookup on a primary key column, for example, the difference will also
be small.

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux