Sean Davis wrote: > This might be a bit off-topic, but I'm curious what folks would do > with this > situation: > > I have about 6 different tab-delimited file types, all of which store > similar information (microarray gene expression). However, the files > come from different manufacturers, so each has slightly different > fields with different meanings. However, there are a few columns > that are shared. I may need to add table formats in the future (as > we get more manufacturers). I can think of at least three ways to go > about storing these data: > > 1) Create a single table that has as many columns as needed for ALL > formats and make manufacturer-specific views, naming columns in the > view as appropriate. Then put rules on the view for inserts, > updates, etc. This is my first choice, I think, but adding a new > manufacturer's format means creating a new view and possibly adding > columns; some columns may NULL for large portions of the table. > > 2) Use postgres inheritance, but even shared columns in our data may > have different names depending on the manufacturer, so there may be > views involved anyway. I'm unclear if this is just a naming issue, or if the fields in the files have different meaning. If it's just a case that supplier A names a field "foo" while supplier B names a field with the same meaning "bar", I would think you'd want to coalesce all these incoming files into a single table containing columns that have meaning to your organization. The effort then just becomes one of mapping incoming fields into the proper columns, but the end result would be something much more consistent and meaningful to your organization. If on the other hand all these incoming fields have different meaning and you need to keep them stored separately, I would look into option (4): just keep a separate table for each supplier, since you said that even shared fields may have different meaning; then use a view over all the tables to answer any queries across suppliers. But I definitely wouldn't let the way your suppliers name their fields in the files they send you drive how you design your database. That's just a data mapping issue which is easily solved during data import. > > 3) Use a fully-normalized strategy that stacks each column into one > very long table--this would be my last choice. > > Thanks for any insight. > > (For replies, please try to reply to me directly as well as the list > as I just get digests right now). > > Thanks, > Sean > > > ---------------------------(end of > broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org -- Guy Rouillier