Re: left join with smaller table or index on (XXX is not null) to avoid upsert

"Scott Marlowe" <scott.marlowe@xxxxxxxxx> · Sun, 18 Jan 2009 19:44:40 -0700

On Sun, Jan 18, 2009 at 2:12 PM, Ivan Sergio Borgonovo
<mail@xxxxxxxxxxxxxxx> wrote:
> I've to apply a discounts to products.
>
> For each promotion I've a query that select a list of products and
> should apply a discount.
>
> Queries may have intersections, in these intersections the highest
> discount should be applied.
>
> Since queries may be slow I decided to proxy the discount this way:
>
> create table Product(
>  ProductID int primary key,
>  ListPrice numeric
> );
>
> create table ProductPrice(
>  ProductID int references Products (ProcuctID),
>  DiscountedPrice numeric
> );
>
> Case A)
> If I want the ProductPrice to contain just products with a
> discount I'll have to update, then see if the update was successful
> otherwise insert.
> I expect that the products involved may be around 10% of the overall
> products.

You could update returning rowsupdated, so you could run that and get
a list of all the rows that were updated.  Then build a simple select
where not in (those rows) to get the rest for inserting.

> I'm expecting that:
> - ProductPrice will contain roughly but less than 10% of the
> catalogue.

Then an index will only help when you're selecting on something more
selective.  unless your rows are really skinny, a sequential scan will
usually win over an index scan.

> Since I haven't been able to find a quick way to build up a
> hierarchy of promotions to apply/re-apply discounts when promotion
> are added/deleted, creating/deleting promotions looks critical as
> well.
> The best thing I was able to plan was just to reapply all promotions
> if one is deleted.

Watch out for bloat when doing this.  A simple where change of

update table set b = 45 ;

to

update table set b = 45 where b <> 45 ;

can save the db a lot of work, and if you can apply the same logic to
your update to save some dead tuples it's worth looking into.
Updating whole tables wholesale is not definitely not pgsql's strong
suit.

> So it looks to me that approach B is going to make updating of
> discounts easier, but I was wondering if it makes retrieval of
> Products and Prices slower.

If you do bulk updates, you'll blow out your tables if you don't keep
them vacuumed.  50% dead space is manageable, if your data set is
reasonably small (under a few hundred meg).  Just make sure you don't
run 20 updates on a table in a row, that kind of thing.

> Having a larger table that is being updated at a rate of 5% to 10% a
> day may make it a bit "fragmented".

Nah, autovacuum should keep it clean and running smooth.  Fragmenting
isn't a problem in postgresql so much.

Tips: Look at indexes that match common where clauses.  If you do a
lot of "where a.x=b.y and b.x is not null" then index b.y where b.x is
not null kinda thing.

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general