Re: Really really slow select count(*)

felix <crucialfelix@xxxxxxxxx> · Mon, 7 Feb 2011 02:52:01 +0100

On Sun, Feb 6, 2011 at 4:23 PM, Scott Marlowe <scott.marlowe@xxxxxxxxx> wrote:

Let's review:
Â

1: No test or staging system used before production

no, I do not have a full ubuntu machine replicating the exact memory and application load of the production server.

this was changing one configuration parameter. something I was advised to do, read about quite a bit, tested on my development server (mac) and then proceeded to do at 6 am on Sunday morning, our slowest time.

2: DB left in an unknown state (trying to shut down, not able)

what ?

I checked the site, everything was normal. ÂI went in via psql and tried some queries for about half an hour and continued to monitor the site. Âthen I went to bed at 7am (EU time).

Why did it shutdown so much later ?

I have never restarted postgres before, so this was all new to me. ÂI apologize that I wasn't born innately with such knowledge.

So is it normal for postgres to report that it failed to shut down, operate for an hour and then go ahead and restart itself ?

3: No monitoring software to tell you when the site is down

of course I have monitoring software. Âboth external and internal. Âbut it doesn't come and kick me out of bed. Âyes, I need an automated cel phone call. Âthat was the first thing I saw to afterwards.

4: I'm gonna just go ahead and guess no backups were taken either, or

are regularly taken.

WTF ? Â of course I have backups. ÂI just went through a very harsh down period event. ÂI fail to see why it is now necessary for you to launch such an attack on me. Â

Perhaps the tone of my post sounded like I was blaming you, or at least you felt that way. ÂWhy do you feel that way ?

Why not respond with: Â"ouch ! Âdid you check this ... that...." Âsay something nice and helpful. Âcorrect my mistakes

Â

This website can't be very important, if that's the way you treat it.

just to let you know, that is straight up offensive

This is high traffic real estate site. ÂDowntime is unacceptable. ÂI had less downtime than this when I migrated to the new platform.

I spent rather a large amount of time reading and questioning here. ÂI asked many questions for clarification and didn't do ANYTHING until I was sure it was the correct solution. ÂI didn't just pull some shit off a blog and start changing settings at random.

I double checked opinions against different people and I searched for more docs on that param. ÂAmazingly none of the ones I found commented on the shared memory issue and I didn't even understand the docs discussing shared memory because it didn't seem to apply to what I was doing. Âthat's my misunderstanding. ÂI come her to share my misunderstanding.

Â

And my phone starts complaining a minute after the site stops

responding if something does go wrong the rest of the time. ÂDo not

lay this at anyone else's feet.

I didn't. ÂThere is not even the slightest hint of that in my post.

I came here and posted the details of where I went wrong and what confused me about the documentation that I followed. ÂThat's so other people can follow it and so somebody here can comment on it.

> Âchanging that default

> is brutally difficult and can only really be done by adjusting something in

> the kernel.

Please, that's a gross exaggeration. ÂThe sum totoal to changing them is:

run sysctl -a|grep shm

copy out proper lines to cahnge

edit sysctl.conf

put new lines in there with changes

sudo sysctl -p Â# applies changes

edit the appropriate postgresql.conf, make changes

sudo /etc/init.d/postgresql-8.3 stop

sudo /etc/init.d/postgresql-8.3 start

Considering how splendidly the experiment with changing fsm_max_pages went, I think you can understand that I have no desire to experiment with kernel settings.

It is easy for you because you ALREADY KNOW everything involved. ÂI am not a sysadmin and we don't have one. ÂMy apologies for that.

so does the above mean that I don't have to restart the entire server, just postgres ? ÂI assumed that changing kernel settings means rebooting the server.

> I have clustered that table, its stillÂunbelievablyÂslow.

Did you actually delete the old entries before clustering it? Âif it's

still got 4G of old sessions or whatever in it, clustering ain't gonna

help.

its a different table. Âthe problem one has only 300k rows

the problem is not the size, the problem is the speed is catastrophic

> I still don't know if this bloat due to the small free space map has

> anything to do with why the table is performing like this.

Since you haven't show us what changes, if any, have happened to the

table, neither do we :)

sorry, it didn't seem to be the most important topic when I got out of bed

Â