Additional things to consider for decreasing pressure on the cheap drives:
- Another configuration parameter to look into is effective_io_concurrency. For SSD we typically set it to 1 io per channel of controller card not including the RAID parity drives. If you decrease this value PostgreSQL will not try to initiate as many parallel io operations.
http://www.postgresql.org/docs/8.4/static/runtime-config-resource.html
- A strategy Instagram used is to (manually) vertically partition tables that have some, but not all, columns updated frequently. When PostgreSQL updates a data value for a column it writes a new copy of the entire row and marks the original row as garbage. If you have tables with many rows but many of them are updated infrequently (Instagram's example was a table with user information where "last login" was updated very frequently but other information about the user was rarely updated) you could split the frequently updated columns into a separate table to reduce io. However note that PostgreSQL does not have features to support vertical partitioning directly so to employ this technique you would need to manually partition and update your SQL code accordingly.
- Also check into your WAL and logging settings to make sure you aren't writing more to them than you need to.
Will J Dunn
On Thu, Apr 23, 2015 at 10:38 AM, Jim Nasby <Jim.Nasby@xxxxxxxxxxxxxx> wrote:
On 4/23/15 8:36 AM, Job wrote:
Hello, thank you first of all for your wonder help!
Tomas, regarding:
There are ways to make the writes less frequent, both at the database
and OS level. We don't know what's your PostgreSQL config, but making
the checkpoints less frequent and tuning the kernel/mount options may
help a lot.
We can raise up checkpoints, at kernel-side which options do you suggest to tune for minimize disk writing?
Certainly disable atime updates if you haven't already.
Having a long checkpoint period is somewhat similar to running in a ram disk and doing periodic backups. BTW, if you want to try using backups, I recommend you setup actual PITR archiving to the SSD. That will write data sequentially and in larger chunks, which should help the SSD better deal with the writes. This will give you more control over how much data you lose during a crash. Though, if all you do is a single large update once a day you're probably better off just taking a backup right after the update. I would also look at the backup size and recovery time of pg_dump vs PITR or a filesystem snapshot; it could be significantly smaller. It might take longer to restore though.
BTW, if you go the ramdisk route you should turn off fsync; there's no point in the extra calls to the kernel. Only do that if the ENTIRE database is in a ramdisk though.
We have a table, about 500Mb, that is updated and written every day.
When machines updates, table is truncated and then re-populated with pg_bulk.
But i think we strongly writes when importing new data tables..
That depends on how much data has actually changed. If most of the data in the table is changed then truncate and load will be your best bet. OTOH if relatively little of the data has changed you'll probably get much better results by loading the data into a loading table and then updating changed data, deleting data that shouldn't be there anymore, and inserting new data.
You definitely want the loading table to not be on SSD, and to be unlogged. That means it needs to go in a tablespace on a ram disk. True temporary tables (CREATE TEMP TABLE) are not logged, but I don't know that that will work well with pg_bulk. You can use a real table with the unlogged option to the same effect (though, I'm not sure if unlogged is available in 8.4).
You also need to consider the indexes. First, make absolutely certain you need all of them. Get rid of every one that isn't required. Second, you need to test the amount of data that's written during an update with the indexes in place *and doing a subsequent VACCUM* compared to dropping all the indexes and re-creating them.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general