Re: a heavy duty operation on an "unused" table kills my server

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 13/01/2010 3:03 PM, Eduardo Piombino wrote:
One last question, this IO issue I'm facing, do you think it is just a
matter of RAID configuration speed, or a matter of queue gluttony (and
not leaving time for other processes to get into the IO queue in a
reasonable time)?

Hard to say with the data provided. It's not *just* a matter of a slow array, but that might contribute.

Specifically, though, by "slow array" in this case I'm looking at latency rather than throughput, particularly read latency under heavy write load. Simple write throughput isn't really the issue, though bad write throughput can make it fall apart under a lighter load than it would otherwise.

High read latencies may not be caused by deep queuing, though that's one possible cause. A controller that prioritizes batching sequential writes efficiently over serving random reads would cause it too - though reducing its queue depth so it can't see as many writes to batch would help.

Let me stress, again, that if you have a decent RAID controller with a battery backed cache unit you can enable write caching and most of these issues just go away. Using an array format with better read/write concurrency, like RAID 10, may help as well.

Honestly, though, at this point you need to collect data on what the system is actually doing, what's slowing it down and where. *then* look into how to address it. I can't advise you much on that as you're using Windows, but there must be lots of info on optimising windows I/O latencies and throughput on the 'net...

Because if it was just a matter of speed, ok, with my actual RAID
configuration lets say it takes 10 minutes to process the ALTER TABLE
(leaving no space to other IOs until the ALTER TABLE is done), lets say
then i put the fastest possible RAID setup, or even remove RAID for the
sake of speed, and it completes in lets say again, 10 seconds (an unreal
assumption). But if my table now grows 60 times, I would be facing the
very same problem again, even with the best RAID configuration.

Only if the issue is one of pure write throughput. I don't think it is. You don't care how long the ALTER takes, only how much it impacts other users. Reducing the impact on other users so your ALTER can complete in its own time without stamping all over other work is the idea.

The problem would seem to be in the way the OS (or hardware, or someone
else, or all of them) is/are inserting the IO requests into the queue.

It *might* be. There's just not enough information to tell that yet. You'll need to do quite a bit more monitoring. I don't have the expertise to advise you on what to do and how to do it under Windows.

What can I do to control the order in which these IO requests are
finally entered into the queue?

No idea. You probably need to look into I/O priorities on Windows.

Ideally you shouldn't have to, though. If you can keep read latencies at sane levels under high write load on your array, you don't *need* to mess with this.

Note that I'm still guessing about the issue being high read latencies under write load. It fits what you describe, but there isn't enough data to be sure, and I don't know how to collect it on Windows.

What cards do I have to manipulate the order the IO requests are entered
into the "queue"?
Can I disable this queue?
Should I turn disk's IO operation caches off?
Not use some specific disk/RAID  vendor, for instance?

Don't know. Contact your RAID card tech support, Google, search MSDN, etc.

--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux