Re: Optimize date query for large child tables: GiST or GIN?

David Jarvis <thangalin@xxxxxxxxx> · Thu, 20 May 2010 12:45:53 -0700

When using MySQL, the performance was okay (~5 seconds per query) using:

  date( concat_ws( '-', y.year, m.month, d.day ) ) between

    -- Start date.
    date( concat_ws( '-', y.year, $P{Month1}, $P{Day1} ) ) AND

    -- End date. Calculated by checking to see if the end date wraps
    -- into the next year. If it does, then add 1 to the current year.

    --
    date(

      concat_ws( '-',
        y.year + greatest( -1 *

          sign(
            datediff(

              date(
                concat_ws('-', y.year, $P{Month2}, $P{Day2} )

              ),
              date(

                concat_ws('-', y.year, $P{Month1}, $P{Day1} )
              )

            )
          ), 0

        ), $P{Month2}, $P{Day2}
      )

    )

This calculated the correct start days and end days, including leap years.

With MySQL, I "normalized" the date into three different tables: year references, month references, and day references. The days contained only the day (of the month) the measurement was made and the measured value. The month references contained the month number for the measurement. The year references had the years and station. Each table had its own index on the year, month, or day.

When I had proposed that solution to the mailing list, I was introduced to a more PostgreSQL-way, which was to use indexes on the date field.

In PostgreSQL, I have a single "measurement" table for the data (divided into 72 child tables), which includes the date and station. I like this because it feels clean and it is easier to understand. So far, however, it has not been fast.

I was thinking that I could add three more columns to the measurement table:

year_taken, month_taken, day_taken

Then index those. That should allow me to avoid extracting years, months, and days from the m.taken date column.

What do you think?

Thanks again!
Dave