Re: tuple data size and compression

Adrian Klaver <adrian.klaver@xxxxxxxxxxx> · Fri, 16 Dec 2016 07:10:24 -0800

On 12/16/2016 12:23 AM, Tom DalPozzo wrote:
        I see. But in my case rows don't reach that thresold (I didn't
        check if
        2K but I didn't change anything). So I'm wondering if there is
        any other
        chance except the TOAST to get the rows compressed or not.

    Are you really sure you want that? For small files the overhead of
    compression tends to out weigh the benefits. A contrived example
    biased to making my point:

    aklaver@killi:~> dd if=/dev/urandom of=file.txt bs=10 count=10
    10+0 records in
    10+0 records out
    100 bytes (100 B) copied, 0.253617 s, 0.4 kB/s
    aklaver@killi:~> l -h file.txt
    -rw-r--r-- 1 aklaver users 100 Dec 15 13:07 file.txt
    aklaver@killi:~> gzip file.txt
    aklaver@killi:~> l -h file.txt.gz
    -rw-r--r-- 1 aklaver users 132 Dec 15 13:07 file.txt.gz

        I noticed that, when I use constant data, the total IO writes (by
        iostat) are more or less 1/2 of the the total IO writes when using
        random or other data hard to compress.

    Define constant data?

    I thought the data you are inputting is below the compression threshold?

    Is I/O causing a problem or to put it another way, what is the
    problem you are trying to solve?

    --
    Adrian Klaver
    adrian.klaver@xxxxxxxxxxx <mailto:adrian.klaver@xxxxxxxxxxx>

Hi Adrian,
I haven't got a problem. I'm just trying to estimate a worst case of
total IO writes of my DB on the field, over the years.

Well worst case is some program goes haywire and saturates you I0 with 
writes. I would think a more reasonable estimate would be the average 
write load with a +/- standard deviation. Though that would entail 
guessing more about the future then I would be willing to commit to.

If tuples get compressed, then all depends on the data compressibility.
So, either 1) I understand if/when they get compressed or 2) I must
simulate with different kinds of data (compressible and not).
Furthermore, in general I like, if possible, to be aware of what is
going behind the scenes.
As for 2) I made 3 tests, each inserting 1million rows of 100 bytea on
my 'dati' field. First I tried  100 bytes=constant data=\\x7b7b7b7b....
 then I tried 100 bytes=random and finally inserting 100 bytes taken
from random offsets of a compressed file.
It seems that IO writes strongly depend on the data, even if dati field
length is just 100.

That goes without saying.

Two questions:

1) Do you know what your data in the future is going to be?

2) Is a 100 byte bytea a realistic approximation of that data?

wal_compression=off.
Regards
Pupillo

--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general