Search Postgresql Archives

Advice for using integer arrays?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I'm fairly new to Postgres, and have a design issue for which an array of integers might be a good solution. But I'd like to hear from the experts before proceeding down this path.

Essentially, I'm trying to model the relationship between a group of biological samples and their genes. Each sample (there are ~10K of them now, with more coming) will have about 30,000 genes. Conversely, a particular gene may be present in almost all samples.

So I've created the following table to handle the many-to-many relationship:

 create table sample_gene (id serial, sample_id int, gene_id int);

which looks like this when populated:

sample_id    |    gene_id
---------------------------------------
1                 |      1
1                 |      2
...
1                 |    30475
2                 |     1
2                 |     2
...
2                 |    29973
3                 |      1
etc.

The table now contains hundreds of millions of rows (with many, many more to come). Join performance between samples and genes is quite slow, even with indexes on sample_id and gene_id.

So it occurred to me: why not eliminate all the duplicate sample_id values by storing the gene_id's in an array, like so:

create table sample_gene_array (id serial, sample_id int, gene_id int [] );

So now the table data looks like this:

sample_id    |    gene_id []
---------------------------------------
1                 |      [1:30475]
2                 |      [1:29973]
etc.

The new table is significantly smaller, and performance (using ANY[] ) is quite good. Nevertheless, I'm uneasy. I come from a Sybase ASE background, and so have no experience with arrays as datatypes. Is it okay to store 30K+ gene values in an array in the linking table (or maybe even in the sample table itself, thus eliminating the linking table)? Should I unnest the gene_id's first, before using them to join to the gene table?

TIA for any guidance you can provide. Again, I'm a Postgres neophyte - but I'm in awe of the power and flexibility of this database, and wish that I'd started using it sooner.

------
Michael Heaney
JCVI





--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux