Re: Disk filled-up issue after a lot of inserts and drop schema

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



In Unix/Linux with many of the common file system types, if you delete a file, but a process still has it open, it will continue to "own" the disk space until that process closes the file descriptor or dies.  If you try "ls" or other file system commands, you won't actually see the file there, yet it really is, and it still has exclusive control of a portion of the disk.  The file is "unlinked" but the data blocks for the file are still reserved.

Like 'ls', the 'du' command only looks at files that still exist and adds up the disk space for those files.  It does not know about these files that have been unlinked, but still reserve a large portion of the disk.

I don't know why something still has an open file descriptor on something you believe has been removed, but at least that explains why you are experiencing the discrepancy between "du" and the real available space on the disk.


On Wed, Sep 14, 2016 at 9:53 AM, Pietro Pugni <pietro.pugni@xxxxxxxxx> wrote:
I’m running PostgreSQL 9.5.4 on a virtual machine for production purposes. It runs Ubuntu 16.04.1 LTS 64bit, 32GB RAM, 461GB disk space and 4 x logical CPUs.

Postgres executes the following activities:
- many INSERTS for ETL
- a lot of read and write operations for the main OLTP application

The ETL job is still under development, so I’m launching several sequential “tries” in order to get the whole thing working. The ETL procedure consists of a lot of inserts packed inside transactions. At the moment each transaction consists of 100k inserts, so for a 90mln rows table I get 90mln inserts packed in 900 transactions. I know it’s not the best, but JDBC drivers combined with Pentaho doesn’t seem to pack more inserts into one, so I get a lot of overhead. I can see INSERT, BIND and PARSE called for each insert.. I think it’s Pentaho which embeds the INSERT in a parametric query.. I hate Pentaho.. anyway..

The ETL procedure does the following:
1) DROP SCHEMA IF EXISTS data_schema CASCADE;
2) creates the “data_schema” schema and populates it with tables and rows using INSERTs as described before;
3) if an error occurs, drop the schema

I’m repeating the previous steps many times because of some Pentaho errors which the team is working on in order to get it working. This stresses the WAL because the interruption of the process interrupts the current transaction and is followed by a DROP SCHEMA .. CASCADE.

After few days since we began debugging the ETL elaboration, the disk filled up and the last ETL job was automatically aborted. Note that the DB data directory is located on the same root disk at /var/lib/postgresql/9.5/main

What shocked me was that the data directory of Postgres was just 815MB in size ($ du -h /var/lib/postgresql/9.5/main ) and pg_xlog was 705MB, but the entire disk was full ("df -h" returned a disk usage of 100%).

I looked for any postgres activity and only noticed a checkpoint writer process that was writing at low speeds (IO usage was about 5%).
Also, "SELECT * FROM pg_stat_activity" returned nothing and the most shocking part was that the "du -h /“ command returned 56GB as the total size of files stored on the whole disk!!! The same was for “du -ha /“, which returns the apparent size. 

The total disk size is 461GB, so how is it possible that “df -h” resulted in 461GB occupied while “du -h /“ returned just 56GB?

After executing:
$ service postgresql stop
$ service postgresql start

the disk was freed and “df -h” returned a usage of just 16%!

The other questions are:
- how can I prevent the disk from filling up? I’m using the default configuration for the WAL (1GB max size).
- how can I tune Postgres to speed up the INSERTs?

The actual configuration is the following:
listen_addresses = 'localhost'
max_connections = 32
shared_buffers = 16GB
work_mem = 128MB
maintenance_work_mem = 512MB
effective_io_concurrency = 10
checkpoint_completion_target = 0.9
cpu_tuple_cost = 0.02
cpu_index_tuple_cost = 0.01
cpu_operator_cost = 0.005
effective_cache_size = 24GB
default_statistics_target = 1000

May be that some of these parameters causes this strange behavior? checkpoint_completion_target?

Thanks to everyone for the support.

Best regards,
 Pietro Pugni





[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux