On Jan 7, 2008 4:49 PM, Sergei Shelukhin <realgeek@xxxxxxxxx> wrote: > > Scott Marlowe wrote: > > On Jan 5, 2008 9:00 PM, Sergei Shelukhin <realgeek@xxxxxxxxx> wrote: > > > >> Hi. Running postgres 8.2 on debian. > >> I've noticed that concurrent inserts (archiving) of large batches data > >> into two completely unrelated tables are many times slower than the > >> same inserts done in sequence. > >> Is there any way to speed them up apart from buying faster HDs/ > >> changing RAID configuration? > >> > > > > What method are you using to load these data? Got a short example > > that illustrates what you're doing? > > > > > The basic structure is as follows: there are several tables with > transaction data that is stored for one month only. > The data comes from several sources in different formats and is pushed > in using a custom script. > It gets the source data and puts it into a table it creates (import > table) with the same schema as the main table; then it deletes the month > old data from the main table; it also searches for duplicates in the > main table using some specific criteria and deletes them too (to make > use of indexes 2nd temp table is created with id int column and it's > populated with one insert ... select query with the transaction ids of > data duplicate in main and import tables, after that delete from pages > where id in (select id from 2nd-temp-table) is called). Then it inserts > the remainder of the imports table into the main table. > There are several data load processes that function in the same manner > with different target tables. > When they are running in sequence, they take about 20 minutes to > complete on average. If, however, they are running in parallel, they can > take up to 3 hours... I was wondering if it's solely the HD bottleneck > case, given that there's plenty of CPU and RAM available and postgres is > configured to use it. Ahh, thanks for the more detailed explanation. Now I get what you're facing. There are a few things you could do that would probably help. Doing more than one might help. 1: Buy a decent battery backed caching RAID controller. This will smooth out writes a lot. If you can't afford that... 2: Build a nice big RAID-10 array, say 8 to 14 discs. 3: Put pg_xlog on a physically separate drive from the rest of the database. 4: Put each table being inserted to on a separate physical hard drives. 5: Stop writing to multiple tables at once. 6: (Not recommended) run with fsync turned off. Each of these things can help on their own. My personal preference for heavily written databases is a good RAID controller with battery backed caching on and a lot of discs in RAID-10 or RAID-6 (depending on read versus write ratio and the need for storage space.) RAID-10 is normally better for performance, RAID-6 with large arrays is better for maximizing your size while maintaining decent performance and reliability. RAID-5 is right out. ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org