Re: Speeding up a query.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks for the replies everyone. I'll try to answer them all in this one email. I will send another email immediately after this with additional details about the query.

> - Frequently the biggest performance gains can be reached by
>   a (painful) redesign. Can ou change the table structure in a way
>   that makes this query less expensive?

I have considered redesigning the algorithm to accommodate this. As I've said, there's one row per five minute time slot. Instead, I could represent an interval of time with a row. For example, "start_time" of "08:00" with an "end_time" of "12:00" or perhaps an interval "duration" of "4 hours". The difficulty becomes in managing separate time requirements (nurse vs unit) for each time slot, and in inserting/updating new rows as pieces of those time slots or intervals are used up. Having a row per five minute interval avoids those complications so far. Still, I'd start with 32 rows and increase the number, never reaching 3,280.. :)

> - You have an index on matrix.xxxxx, right?

I have tried indexes on each common join criteria. Usually it's "time,unit", "time,nurse", or "time,unit_scheduled", "time,nurse_scheduled" (the later two being Booleans). In the first two cases it's made a difference of less than a second. In the last two, the time actually increases if I add "analyze" statements in after updates are made.

> - Can you reduce the row count of the two subqueries by adding
>   additional conditions that weed out rows that can be excluded
>   right away?

I use some additional conditions. I'll paste the meat of the query below.

> - Maybe you can gain a little by changing the "select *" to
>   "select id" in both subqueries and adding an additional join
>   with matrix that adds the relevant columns in the end.
>   I don't know the executor, so I don't know if that will help,
>   but it would be a simple thing to test in an experiment.

I wrote the "select *" as simplified, but really, it returns the primary key for that row.

> how far in advance do you schedule?  As far as necessary?

It's done on a per day basis, each day taking 8-12 seconds or so on my workstation. We typically schedule patients as much as three to six months in advance. The query already pulls data to a temporary table to avoid having to manage a massive number of rows.

> How many chairs are there?  How many nurses are there?   This is a
> tricky (read: interesting) problem.

In my current template there are 17 chairs and 7 nurses. Chairs are grouped into pods of 2-4 chairs. Nurses cover one to many pods, allowing for a primary nurse per pod as well as floater nurses that cover multiple pods.




Matthew Hartman
Programmer/Analyst
Information Management, ICP
Kingston General Hospital
(613) 549-6666 x4294 
 

-----Original Message-----
From: Merlin Moncure [mailto:mmoncure@xxxxxxxxx] 
Sent: Wednesday, June 17, 2009 9:09 AM
To: Hartman, Matthew
Cc: pgsql-performance@xxxxxxxxxxxxxx
Subject: Re:  Speeding up a query.

On Tue, Jun 16, 2009 at 2:35 PM, Hartman,
Matthew<Matthew.Hartman@xxxxxxxxxx> wrote:
> Good afternoon.
>
> I have developed an application to efficiently schedule chemotherapy
> patients at our hospital. The application takes into account several
> resource constraints (available chairs, available nurses, nurse coverage
> assignment to chairs) as well as the chair time and nursing time
> required for a regimen.
>
> The algorithm for packing appointments in respects each constraint and
> typically schedules a day of treatments (30-60) within 9-10 seconds on
> my workstation, down from 27 seconds initially. I would like to get it
> below 5 seconds if possible.
>
> I think what's slowing is down is simply the number of rows and joins.
> The algorithm creates a scheduling matrix with one row per 5 minute
> timeslot, per unit, per nurse assigned to the unit. That translates to
> 3,280 rows for the days I have designed in development (each day can
> change).
>
> To determine the available slots, the algorithm finds the earliest slot
> that has an available chair and a count of the required concurrent
> intervals afterwards. So a 60 minute regimen requires 12 concurrent
> rows. This is accomplished by joining the table on itself. A second
> query is ran for the same range, but with respect to the nurse time and
> an available nurse. Finally, those two are joined against each other.
> Effectively, it is:
>
> Select *
> From   (
>        Select *
>        From matrix m1, matrix m2
>        Where m1.xxxxx = m2.xxxxx
>        ) chair,
>        (
>        Select *
>        From matrix m1, matrix m2
>        Where m1.xxxxx = m2.xxxxx
>        ) nurse
> Where chair.id = nurse.id
>
> With matrix having 3,280 rows. Ugh.
>
> I have tried various indexes and clustering approachs with little
> success. Any ideas?

how far in advance do you schedule?  As far as necessary?

How many chairs are there?  How many nurses are there?   This is a
tricky (read: interesting) problem.

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux