Search Postgresql Archives

Versioned, chunked documents

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I have documents which are divided into chunks, so that the (ordered)
concatenation of chunks make the whole document. Each of the chunks may
be edited separately and past versions of the chunks need to be kept.

The structure looks fairly simple:

CREATE TABLE documents (
    id SERIAL PRIMARY KEY,
    title TEXT NOT NULL,
...
);

CREATE TABLE documents_chunks (
    id SERIAL PRIMARY KEY,
    ctime TIMESTAMP NOT NULL,
    documents_id INTEGER REFERENCES documents(id),
    seq INTEGER NOT NULL, -- sequence within the document
    content TEXT,
...
);

The first goal is to retrieve the latest version of the whole document,
made from the latest versions of all chunks, but later the goal will
also be to fetch the whole version at some point in time (i.e. with
chunks created before a point in time).

I did the first goal by creating two helper views:

CREATE VIEW documents_chunks_last_version_chunk_ids AS
    SELECT documents_id, max(id), seq FROM documents_chunks GROUP BY
documents_id, seq;

CREATE VIEW documents_chunks_last_version_content AS
    SELECT documents_chunks.documents_id, content
        FROM documents_chunks
        JOIN documents_chunks_last_version_chunk_ids ON
documents_chunks.id=documents_chunks_last_version_chunk_ids.max
        ORDER BY documents_chunks_last_version_chunk_ids.seq;

There are indexes on the document_chunks fields seq and documents_id.

Everything looked fine until I examined the output of EXPLAIN ANALYZE
and saw this:

db=> set enable_seqscan to off;
SET
db=> explain analyze select * from documents_chunks_last_version_content
where documents_id=1;

EXPLAIN output given in: http://explain.depesz.com/s/mpY

The query output seems correct on this test case:
db=> select * from documents_chunks_last_version_content where
documents_id=1;
 documents_id | content
--------------+---------
            1 | C1, v2
            1 | C2, v3
            1 | C3, v1
(3 rows)

This huge cost of 10000000000 which appeared out of nowhere in the
EXPLAIN output and the seq scan worry me - where did that come from?
There are absolutely no unindexed fields in the query, and the result
set of the aggregate ("max") is very small.

Of course, I might be doing the whole structure wrong - any ideas?


-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux