Re: Configuration Advice

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Would it be possible to just update the summary table, instead of recreating it from scratch every night?

Hrm, I believe it's probably less work for the computer to do if it's rebuilt. Any number of rows may be changed during an update, not including additions, so I'd have to pull out what's changed and sync it with what's in the summary table already. It'll be a lot more selects and program-side computation to save the big copy; it might work out, but I'd say this would be my last ditch thing. :)

Building these indexes takes forever! It's a long grind through inserts and then building the indexes takes a hefty amount of time too. (about 9 hours). Now, the application is likely part at fault, and we're working to make it more efficient, but it has nothing to do with the index building time. I'm wondering what we can do to make this better if anything; would it be better to leave the indexes on? It doesn't seem to be. Would it be better to use INSERTs instead of copies? Doesn't seem to be.

Would it help if you created multiple indexes simultaneously? You have enough CPU to do it. Is the index creation CPU or I/O bound? 9 million rows should fit in 16 GB of memory, right?

This is a very very interesting idea. It looks like we're probably not fully utilizing the machine for the index build, and this could be the ticket for us. I'm going to go ahead and set up a test for this and we'll see how it goes.

Can you describe the load process in more detail? What's it doing with the 6 gigs?

There's two halves to the load process; the loader and the summarizer. The loader is the part that takes 6 gigs; the summarizer only takes a few hundred MEG.

Basically we have these COBOL files that vary in size but are usually in the hundred's of MEG realm. These files contain new data OR updates to existing data. We load this data from the COBOL files in chunks, so that's not a place where we're burning a lot of memory.

The first thing we do is cache the list of COBOL ID codes that are already in the DB; the COBOL ID codes are really long numeric strings, so we use a sequenced integer primary key. The cache translates COBOL IDs to primary keys, and this takes most of our memory nowadays. Our cache is fast, but it's kind of a memory hog. We're working on trimming that down, but it's definitely faster than making a query for each COBOL ID.

The load is relatively fast and is considered "acceptable", and has been relatively constant in speed. It's the summarizer that's brutal.

The summarizer produces 3 main summary tables and a few secondaries that don't take much time to make. Two of them are smallish and not that big a deal, and the last one is the biggie that's 9 mil rows and growing. To produce the 9 mil row table, we query out the data in groups, do our processing, and save that data to a series of text files that are in blocks of 10,000 rows as I recall. We then copy each file into the DB (there were some issues with copying in an entire 9 mil row file in the past, which is why we don't use just one file -- those issues have been fixed, but we didn't undo the change).

What's your maintenance_work_mem setting? It can make a big difference in sorting the data for indexes.

	6 gigs currently. :)

If you could post the schema including the indexes, people might have more ideas...

	I'll have to ask first, but I'll see if I can :)

Talk to you later, and thanks for the info!


Steve


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux