Re: [PERFORM] Partioning tsearch2 a table into chunks and accessing via views

Benjamin Arai <me@xxxxxxxxxxxxxxxx> · Sat, 25 Aug 2007 11:12:54 -0700

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

As stated in the previous email if I use partitioning then queries  
will be executed sequentially - i.e., instead of log(n) it would be  
(# partitions) * log(n).  Right?

Benjamin

On Aug 25, 2007, at 9:18 AM, Joshua D. Drake wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Benjamin Arai wrote:
This kind of disappointing, I was hoping there was more that could  
be done.

There has to be another way to do incremental indexing without  
loosing
that much performance.

What makes you think you are loosing performance by using  
partitioning?

Joshua D. Drake

Benjamin

On Aug 24, 2007, at 6:29 PM, Joshua D. Drake wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Brandon Shalton wrote:
Benjamin,

In order to avoid the re-indexing I was thinking of instead  
creating
a new
table each month (building its indexes and etc) and accessing  
them all
through a view. This way I only have to index the new data each  
month.

Take a look at bizgres.org (based on postgres).

They have a parent-child structure.

The way i use it, is I have about 30M records a day that are  
inserted
into the database.

Each day is a "child" table to the "parent".

so example:

the parent table is called  "logfile"

each day, is a child, with the structure like  "logfile_YYMMDD"

the "child" inherits the table structure of the parent, such  
that you
could query the child table name directly, or you run the query  
against
the parent (ie. logfile table) and get all the data.

the indexes are done on a per table basis, so new data that  
comes in, is
a lesser amount, and doesn't require re-indexing.

PostgreSQL can do all of this too.

Sincerely,

Joshua D. Drake

example:

select * from logfile_070825 where datafield = 'foo'

if i knew i wanted to specifically go into that child, or:

select * from logfile where datafield = 'foo'

and all child tables are searched and results merged.  You can  
perform
any kind of sql query and field structures are you normally do.

the downside is that the queries are run sequentially.

so if you had 100 child tables, each table is queried via  
indexes, then
results are merged.

but, this approach does allow me to dump alot of data in,  
without having
the re-indexing issues you are facing.

at some point, you could roll up the days, in to weekly child  
tables,
then monthly tables, etc.

I believe Bizgres has a new version of their system that does  
parallel
queries which would certainly speed things up.

For your documents, you can do it by the day it was checked in,  
or maybe
you have some other way of logically grouping, but the parent/child
table structure really helped to solve my problem of adding in  
millions
of records each day.

The closest thing in mysql is using merge tables, which is not  
really
practical when it comes time to do the joins to the tables.

-brandon

http://www.t3report.com - marketing intelligence for online  
marketing
and affiliate programs

---------------------------(end of  
broadcast)---------------------------
TIP 6: explain analyze is your friend

- --

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997  http://www.commandprompt.com/
            UNIQUE NOT NULL
Donate to the PostgreSQL Project: http://www.postgresql.org/about/ 
donate
PostgreSQL Replication: http://www.commandprompt.com/products/

- ---------------------------(end of  
broadcast)---------------------------
TIP 6: explain analyze is your friend

- --

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997  http://www.commandprompt.com/
			UNIQUE NOT NULL
Donate to the PostgreSQL Project: http://www.postgresql.org/about/ 
donate
PostgreSQL Replication: http://www.commandprompt.com/products/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFG0FZKATb/zqfZUUQRAsfRAJ4mjQP+1ltG7pqLFQ+Ru52LA5e7XACcDqKr
PIihth2x3gx3qTEI8WfWNjo=
=AhJx
-----END PGP SIGNATURE-----

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Darwin)

iQIVAwUBRtBxK/yqRf6YpodNAQKVkhAAgF4DaXeMxplX1EUXZMuw9aqr+75NxNcp
ZOCJPSFN0jwzY3MlFCRVjL1kzXmRJB4L3fE2xVQX9reY62TPfYC8m/xatey1X6nc
RdfNb9IzL6OyAghcpnUnwYntQtmGRpJtS7LQrx/SiDz8LWIp2S5v3Q9S8alKNTUS
FupCNy1bL3yJf9tySSvol6JSH2edVt8f48J1j03f5B9zh+G/rKrQ+muuKOHyU3mb
cVJ+gbSWCesuo+9rfaJ24m2ODwZm/YA+ENhlc3EOvD8z+cYn2OjuvAqvHABRsEKe
+E9NWBPK/7UT4/T4B/LcBW1B6VISFqyETkwe2fhY5kVZnF+f0KtQIxXh/9qMsnnh
tWthI9YmG4MIBmCsJwdneABHdfMJDp8IlawXqMlX4VkPHUrUtiQV/oDNsHMrU8BM
SZOK5m0ADgXk0rndkEWXhERsyuFaocFj+snvaJEVH9PJSDVgjo7EMW5Qfo6p3NFg
ujBurhLaSuj52vClbdOs3lYp0Drbuf9iQnot3pD4XsCKAOTQm3S7BvgKMd5FUHLX
HBFn4KiSRGx7hwlrss4rjqJ8BoJKbtvGxyNSiwZkrAOke+gqEML6pPdvlAj3Dif8
KrsKcEu/cuR8euqX9IYCZIw4GYLqgs3mewfQIt5bSfw3yHvFyOgolyUeYfnYYlbr
+u145pL2KZc=
=T4dg
-----END PGP SIGNATURE-----

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
      choose an index scan if your joining column's datatypes do not
      match