Re: File Fragmentation

Jasen Betts <jasen@xxxxxxxxxx> · 23 Mar 2013 23:21:46 GMT

On 2013-03-20, jg <jg@xxxxxxxx> wrote:
> Hi,
>
> I have a PostgreSQL database with 50 tables.
> Every minute, sequentially, a batch load 10.000 rows of 250 bytes with a COPY.
>
> After a day, i got a database with 50 tables with 1.440 set of 10.000 rows.
> The tables are cleany and naturally clustered by the inserted timestamp.
> Each table has data in a file with 1.440 fragments (each day)
>
> Now, there is a partition rotation script, that suppress old tables when some size limit happens.
> Let suppose, that this script runs and suppress only one table qith few days of data, then recreates a new empty one.
>
> I got a disk freespace very fragmented, the space used by the rotated table.
>
> Then some COPY inserts new data, the tables got new data in theirs files and continue to be fragmented.
> The new tables begins to grows from the begining of the free space to and is more fragmented that ever.
>
> But all the data are always clustered in the tables.
>
> After few more rotated tables, all the tables are heavily fragmented and even if the data is clustered inside the file fragments are spread all over the drive.
>
> After few days, I see IO wait grows and grows, even when the size of the database stabilises due to the rotation script.
>
> I suspect the heavy fragmented files to the cause of the IO wait grows (PostgreSQL on WIndows).
> How to cope with that ?
> It seems I can not pregrow file in PostgreSQL.

> I found a trick: if i created an empty table and i insert dummy
>data, then i insert good data, then i suppress dummy data, then i
>vacuum the table (but not a full vacuum) i got a large file with
>freespace at the begining of the file. If all the files were created
>with that tricks and larger than the maximum data COPYed, i will have
>no fragmented files.       

I assume you're using generate_series to do the dummy data in a
single insert.

it might be faster to insert the dummy data through one connection,
then insert the first good data through a second connection then rollback
the insert of the dummy data. and vacuum the table.

--
⚂⚃ 100% natural

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general