Re: Configuration Advice

Steve <cheetah@xxxxxxxxxx> · Wed, 17 Jan 2007 18:55:20 -0500 (EST)

Would it be possible to just update the summary table, instead of recreating 
it from scratch every night?

	Hrm, I believe it's probably less work for the computer to do if 
it's rebuilt.  Any number of rows may be changed during an update, not 
including additions, so I'd have to pull out what's changed and sync it 
with what's in the summary table already.  It'll be a lot more selects and 
program-side computation to save the big copy; it might work out, but I'd 
say this would be my last ditch thing. :)

Building these indexes takes forever!  It's a long grind through inserts 
and then building the indexes takes a hefty amount of time too.  (about 9 
hours).  Now, the application is likely part at fault, and we're working to 
make it more efficient, but it has nothing to do with the index building 
time.  I'm wondering what we can do to make this better if anything; would 
it be better to leave the indexes on?  It doesn't seem to be.  Would it be 
better to use INSERTs instead of copies?  Doesn't seem to be.

Would it help if you created multiple indexes simultaneously? You have enough 
CPU to do it. Is the index creation CPU or I/O bound? 9 million rows should 
fit in 16 GB of memory, right?

	This is a very very interesting idea.  It looks like we're 
probably not fully utilizing the machine for the index build, and this 
could be the ticket for us.  I'm going to go ahead and set up a test for 
this and we'll see how it goes.

Can you describe the load process in more detail? What's it doing with the 6 
gigs?

	There's two halves to the load process; the loader and the 
summarizer.  The loader is the part that takes 6 gigs; the summarizer only 
takes a few hundred MEG.

	Basically we have these COBOL files that vary in size but 
are usually in the hundred's of MEG realm.  These files contain new data 
OR updates to existing data.  We load this data from the COBOL files in 
chunks, so that's not a place where we're burning a lot of memory.

	The first thing we do is cache the list of COBOL ID codes that are 
already in the DB; the COBOL ID codes are really long numeric strings, so 
we use a sequenced integer primary key.  The cache translates COBOL IDs to 
primary keys, and this takes most of our memory nowadays.  Our cache is 
fast, but it's kind of a memory hog.  We're working on trimming that down, 
but it's definitely faster than making a query for each COBOL ID.

	The load is relatively fast and is considered "acceptable", and 
has been relatively constant in speed.  It's the summarizer that's brutal.

	The summarizer produces 3 main summary tables and a few 
secondaries that don't take much time to make.  Two of them are smallish 
and not that big a deal, and the last one is the biggie that's 9 mil rows 
and growing.  To produce the 9 mil row table, we query out the data in 
groups, do our processing, and save that data to a series of text files 
that are in blocks of 10,000 rows as I recall.  We then copy each file 
into the DB (there were some issues with copying in an entire 9 mil row 
file in the past, which is why we don't use just one file -- those issues 
have been fixed, but we didn't undo the change).

What's your maintenance_work_mem setting? It can make a big difference in 
sorting the data for indexes.

	6 gigs currently. :)

If you could post the schema including the indexes, people might have more 
ideas...

	I'll have to ask first, but I'll see if I can :)

Talk to you later, and thanks for the info!

Steve