Re: Need suggestion

Ognjen Blagojevic <ognjen.d.blagojevic@xxxxxxxxx> · Thu, 02 Jun 2011 17:33:31 +0200

Carl,

Please keep in mind I am not a Postgres expert nor consultant, I'm just 
sharing my experience. I would also like to hear the opinion of other 
people who worked on projects with similar database sizes.

I would keep all files in the single table -- most probably they will be 
served to the user by the same code (e.g. file download servlet or 
something similar) so it is good if all relevant info is in one table, 
something like:

file {
 id
 mime_type
 name
 content bytea
 creation_date datetime
 modification_date datetime
 creation_user
 modification_user
}

Since both image and document inherits file, you may choose any of the 
common RDB inheritance modeling strategies (one table per hierarchy, one 
table per class...), but since there is just a few fields, I would put 
everything in the same table.

Consider cardinality between cases and files/users. Can one file be 
related with two cases and so on...

Toast table will be splitted in 1GB pieces.

Create indexes considering ways your users will browse or search data.

Regards,
Ognjen

On 2.6.2011 12:22, Carl von Clausewitz wrote:
Dear Ognjen,

thank you - that was my idea too, but I've never seen such a workload
like this. The docu's (which are not processed by any ocr hopefully) and
the pictures are not indexed off course, just some metadatas, which are
related to the exact docu, or pic For example:

productions_docu1:
-sent date
-recieved date
-type
-owner
-case_id
-etc

image_001:
-picturetaken date
-case_id
-image_type

Just these metadatas need to be searched. My questions about the
structure was like this: do you recommend, to store the images and the
docu's in a same table (CREATE TABLE docu_img_store (id
BIGSERIAL, case_id BIGINT, content_type INTEGER, content bytea), or
store it in two different tables? Is there any special settings while
table creations, that I have to set for optimal work (like index,
storage parameter, toast, etc).

(:-) I know, that this project could be a high value revenue for any DB
consultancy related company, but this is a small country, with small
project fees, and I'm employee, not a contractor at my company :-)

Thanks you in advance,
Regards,
Carl

2011/6/2 Ognjen Blagojevic <ognjen.d.blagojevic@xxxxxxxxx
<mailto:ognjen.d.blagojevic@xxxxxxxxx>>

    Carl,

    I don't have experience with that big databases, but I did both
    solutions, and here are pros of both of them:

    1. Files stored on the filesystem:
    - Small database footprint
    - Faster backup, export and import

    2. Files stored in the database
    - RDBMS takes care of transactions and ref. int.
    - Slower backup, export and import but all done in one step
    - Easier continuous archiving

    I slightly prefer option no. 2, since transaction handling, rollback
    and ref. integrity is not so easy to implement when you have two
    different storage systems (FS and RDB).

    As for indexes and tables it is not clear form your message whether
    you need just a regular search (field LIKE 'something'), full text
    search of metadata, or full text search of scanned documents (in
    case they are OCRed).

    Regards,
    Ognjen

    On 1.6.2011 10:08, Carl von Clausewitz wrote:

        Hello Everyone,

        I got a new project, with 100 user in Europe. In this case, I
        need to
        handle production and sales processes an its documentations in
        PostgreSQL with PHP. The load of the sales process is
        negligible, but
        every user produces 2 transaction in the production process,
        with 10-30
        scanned documents (each are 400kb - 800kb), and 30-50 high
        resolution
        pictures (each are 3-8 MB), and they wanted to upload it to
        'somewhere'.
        'Somewhere' could be the server files system, and a link in
        the PostgreSQL database for the location of the files (with some
        metadata), or it could be the PostgreSQL database.

        My question is that: what is your opinion about to store the scanned
        documentation and the pictures in the database? This is a huge
        amount of
        data (between daily 188MB and 800MB data, average year is about 1 TB
        data), but is must be searchable, and any document must be retrieved
        within 1 hour. Every documentations must be stored for up to 5
        years...
        It means the database could be about 6-7 TB large after 5 years, and
        then we can start to archive documents. Any other data size is
        negligible.

        If you suggest, to store all of the data in PostgreSQL, what is your
        recommendation about table, index structure, clustering, archiving?

        Thank you in advance!
        Regards,
        Carl

    --
    Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx
    <mailto:pgsql-general@xxxxxxxxxxxxxx>)
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-general

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general