Re: Partitioning on Table with Foreign Key

Yan Cheng Cheok <yccheok@xxxxxxxxx> · Tue, 26 Jan 2010 19:16:47 -0800 (PST)

Hello Vick,

Can I get some advice from your side?

Currently, I have table :

1 lot is pointing to many units
1 unit is pointing to many measurements

Say, let say, I want "Give me all the measurements result within this lot".

For a single lot, with 100,000 units, with each unit having 48 different measurements (total 4,800,000 measurements). It will takes ~30 seconds to return result.

I use the following stored procedure to help me do so.

======================================================
CREATE OR REPLACE FUNCTION get_measurements(IN bigint)
  RETURNS TABLE(_type text, _value double precision, _unit text) AS
$BODY$DECLARE
    _lotID ALIAS FOR $1;
BEGIN
    RETURN QUERY EXECUTE 'SELECT measurement_type.value, measurement.value, measurement_unit.value
    FROM 
    measurement INNER JOIN unit ON (fk_unit_id = unit_id)
    WHERE fk_lot_id = $1'
    USING _lotID;    
END;$BODY$
  LANGUAGE 'plpgsql' VOLATILE;
ALTER FUNCTION get_measurements(bigint) OWNER TO postgres;
======================================================

However, when time goes on and more lots being added, the performance for the above stored procedure getting worst.

I plan to use table partition, to solve the above problem.

(1) Store unit, according to their unit_id (primary key). For example, 

unit_id from 0 till 999999 will store in table unit_0. 
unit_id from 1000000 till 1999999 will store in table unit_1. 

(2) Store measurement, according to their measurment_id (primary key). For example, 

measurement_id from 0 till 999999 will store in table measurement_0. 
measurement_id from 1000000 till 1999999 will store in table measurement_1.

However, does the above stored procedure, will gain advantage from my (1) and (2) strategy?

As from my point of view, PostgreSQL seems doesn't get enough hint, where he need to go to unit_? and measurement_? table to perform read. In my query, I didn't specific rule like

"WHERE unit_id / 1000000 = 0"

Advice and comment are very much appreciated.

The below are my table partition rule :
========================================
CREATE TABLE measurement_0 (
          CONSTRAINT pk_measurement_id PRIMARY KEY (measurement_id),
          CONSTRAINT fk_unit_id FOREIGN KEY (fk_unit_id)
              REFERENCES unit (unit_id) MATCH SIMPLE
              ON UPDATE NO ACTION ON DELETE CASCADE,

	CHECK ( measurement_id / 1000000 = 0 )
) INHERITS (measurement);

CREATE TABLE measurement_1 (
          CONSTRAINT pk_measurement_id PRIMARY KEY (measurement_id),
          CONSTRAINT fk_unit_id FOREIGN KEY (fk_unit_id)
              REFERENCES unit (unit_id) MATCH SIMPLE
              ON UPDATE NO ACTION ON DELETE CASCADE,

	CHECK ( measurement_id / 1000000 = 1 )
) INHERITS (measurement);

......

CREATE TABLE unit_0 (
          CONSTRAINT pk_unit_id PRIMARY KEY (unit_id),
          CONSTRAINT fk_lot_id FOREIGN KEY (fk_lot_id)
              REFERENCES lot (lot_id) MATCH SIMPLE
              ON UPDATE NO ACTION ON DELETE CASCADE,

	CHECK ( unit_id / 1000000 = 0 )
) INHERITS (measurement);

CREATE TABLE unit_1 (
          CONSTRAINT pk_unit_id PRIMARY KEY (unit_id),
          CONSTRAINT fk_lot_id FOREIGN KEY (fk_lot_id)
              REFERENCES lot (lot_id) MATCH SIMPLE
              ON UPDATE NO ACTION ON DELETE CASCADE,

	CHECK ( unit_id / 1000000 = 1 )
) INHERITS (measurement);

...

Here are my table defination :
==============================
    IF NOT EXISTS(SELECT * FROM information_schema.tables WHERE table_name = 'unit') THEN
        CREATE TABLE unit
        (
          unit_id bigserial NOT NULL,
          fk_lot_id bigint NOT NULL,
          "cycle" bigint NOT NULL,
          CONSTRAINT pk_unit_id PRIMARY KEY (unit_id),
          CONSTRAINT fk_lot_id FOREIGN KEY (fk_lot_id)
              REFERENCES lot (lot_id) MATCH SIMPLE
              ON UPDATE NO ACTION ON DELETE CASCADE
        );

        CREATE INDEX idx_fk_lot_id
          ON unit
          USING btree
          (fk_lot_id);        
    END IF;

    IF NOT EXISTS(SELECT * FROM information_schema.tables WHERE table_name = 'measurement') THEN
        CREATE TABLE measurement
        (
          measurement_id bigserial NOT NULL,
          fk_unit_id bigint NOT NULL,
          "value" double precision,
          CONSTRAINT pk_measurement_id PRIMARY KEY (measurement_id),
          CONSTRAINT fk_unit_id FOREIGN KEY (fk_unit_id)
              REFERENCES unit (unit_id) MATCH SIMPLE
              ON UPDATE NO ACTION ON DELETE CASCADE
        ); 

        CREATE INDEX idx_fk_unit_id
          ON measurement
          USING btree
          (fk_unit_id);

        CREATE INDEX idx_measurement_value
          ON measurement
          USING btree
          (value) WHERE value IS NULL;        
    END IF; 

Thanks and Regards
Yan Cheng CHEOK

--- On Thu, 1/21/10, Yan Cheng Cheok <yccheok@xxxxxxxxx> wrote:

> From: Yan Cheng Cheok <yccheok@xxxxxxxxx>
> Subject: Re:  Partitioning on Table with Foreign Key
> To: "Vick Khera" <vivek@xxxxxxxxx>
> Cc: pgsql-general@xxxxxxxxxxxxxx
> Date: Thursday, January 21, 2010, 11:31 PM
> Make sense to me. Thanks for the
> advice. I will try that out.
> 
> Thanks and Regards
> Yan Cheng CHEOK
> 
> 
> --- On Thu, 1/21/10, Vick Khera <vivek@xxxxxxxxx>
> wrote:
> 
> > From: Vick Khera <vivek@xxxxxxxxx>
> > Subject: Re:  Partitioning on Table with
> Foreign Key
> > To: "Yan Cheng Cheok" <yccheok@xxxxxxxxx>
> > Cc: pgsql-general@xxxxxxxxxxxxxx
> > Date: Thursday, January 21, 2010, 11:30 PM
> > On Thu, Jan 21, 2010 at 3:46 AM, Yan
> > Cheng Cheok <yccheok@xxxxxxxxx>
> > wrote:
> > > table measurement will have a *lot* of row
> (millions).
> > I want to speed up write and read access. Hence, I
> use
> > partition technique.
> > >
> > > CREATE TABLE measurement_y2006m02 (
> > >    CHECK ( date >= DATE '2006-02-01' AND
> date
> > < DATE '2006-03-01' )
> > > ) INHERITS (measurement);
> > >
> > > Opps! But measurement do not have date column.
> How I
> > can refer to measurement's lot's date?
> > >
> > 
> > Split on your lot ID number instead of date, using a
> modulo
> > operation.
> >  I've done this by splitting a table 100 ways
> using the
> > constraing (id
> > % 100 == N) for N = 0 .. 99.
> > 
> > We don't know what your data retention or
> distribution
> > needs are, so
> > it is hard to offer solid advice.
> > 
> 
> 
> 
> 

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general