Search Postgresql Archives

Re: TEXT column > 1Gb

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On Apr 12, 2023, at 12:21 PM, Rob Sargent <robjsargent@xxxxxxxxx> wrote:

On 4/12/23 13:02, Ron wrote:
Must the genome all be in one big file, or can you store them one line per table row?

The assumption in the schema I’m using is 1 chromosome per record. Chromosomes are typically strings of continuous sequence (A, C, G, or T) separated by gaps (N) of approximately known, or completely unknown size. In the past this has not been a problem since sequenced chromosomes were maybe 100 megabases. But sequencing is better now with the technology improvements and tackling more complex genomes. So gigabase chromosomes are common. 

A typical use case might be from someone interested in seeing if they can identify the regulatory elements (the on or off switches) of a gene. The protein coding part of a gene can be predicted pretty reliably, but the upstream untranslated region and regulatory elements are tougher. So they might come to our web site and want to extract the 5 kb bit of sequence before the start of the gene and look for some of the common motifs that signify a protein binding site. Being able to quickly pull out a substring of the genome to drive a web app is something we want to do quickly. 

Not sure what OP is doing with plant genomes (other than some genomics) but the tools all use files and pipeline of sub-tools.  In and out of tuples would be expensive.  Very,very little "editing" done in the usual "update table set val where id" sense.

yeah. it’s basically a warehouse. Stick data in, but then make all the connections between the functional elements, their products and the predictions on the products. It’s definitely more than a document store and we require a relational database.

Lines in a vcf file can have thousands of colums fo nasty, cryptic garbage data that only really makes sense to tools, reader.  Highly denormalized of course.  (Btw, I hate sequencing :) )

Imagine a disciplne where some beleaguered grad student has to get something out the door by the end of the term. It gets published and the rest of the community say GREAT! we have a standard! Then the abuse of the standard happens. People who specialize in bioinformatics know just enough computer science, statistics and molecular biology to annoy experts in three different fields.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux