Re: ERROR: functions in index expression must be marked IMMUTABLE

"Sven R. Kunze" <srkunze@xxxxxxx> · Mon, 27 Feb 2017 11:52:17 +0100

    Hi Geoff, Adrian and Tom,

      thanks for your responses so far. Excuse my late response. I
        will respond to Tom's mail as it covers most points:

      On 26.02.2017 17:50, Tom Lane wrote:

      There are multiple reasons why the text-to-datetime conversion functions
are not immutable:

* some of them depend on the current timezone (but I don't believe date_in
does);

* all of them depend on the current datestyle setting, eg to resolve
'02/03/2017';

* all of them accept strings with time-varying values, such as 'now'
or 'today'.

You could get around the second and third points with to_timestamp(),
but since the only variant of that is one that yields timestamptz and
hence is affected by the timezone setting, it's still not immutable.

    I understand that timezone settings can have serious
      consequences when parsing text to datetime.

    My conceptual issue is that wrapping an "unsafe" operation
      up into a function and **marking** it as "safe" is not making
      things safer. Basically by-passing security guards.

    So, what can I do to parse texts to date(times) in a safe
      manner?

    I'd like to do it the right way. I can safely provide the
      timezone for those dates but it won't be in the jsonb
      data.

      I'm not entirely sure why the OP feels he needs an index on this
_expression_.  If he's willing to restrict the column to have the
exact format 'YYYY-MM-DD', then a regular textual index would sort
the same anyway.  Perhaps what's needed is just to add a CHECK
constraint verifying that the column has that format.

    These were my reasons:

    1) sanity checks (already noted)

    2) index date ranges (using gist)

    3) maybe performance (comparing texts vs comparing dates)
      but I couldn't think of ways to test this

    That's the current schema:

                             Table "public.docs"

     Column |  Type   |                    
      Modifiers                     

    --------+---------+---------------------------------------------------

     id     | integer | not null default
      nextval('docs_id_seq'::regclass)

     meta   | jsonb   | 

    Indexes:

        "docs_pkey" PRIMARY KEY, btree (id)

        "docs_address_idx" gin
      (to_tsvector('english'::regconfig, meta ->>
      'address'::text))

        "docs_address_trgm_idx" gin ((meta ->>
      'address'::text) gin_trgm_ops)

        "docs_birthdate_idx" btree ((meta ->>
      'birthdate'::text))

        "docs_meta_idx" gin (meta jsonb_path_ops)

        "docs_name_idx" gin (to_tsvector('english'::regconfig,
      meta ->> 'name'::text))

    Thanks to the ISO date format,
      I got by with a btree index on birthdate as Tom suggested.

    The index supports queries like the following (although
      22secs still is not great on 10M rows)

    explain analyze select meta->>'birthdate'
      from docs where meta->>'birthdate' >
      '2000-01-01' and meta->>'birthdate' <
      '2000-12-31' order by meta->>'birthdate';

      QUERY
      PLAN                                                              

    ----------------------------------------------------------------------------------------------------------------------------------------

     Index Scan using docs_birthdate_idx on
      docs  (cost=0.43..46067.43 rows=50000 width=136) (actual
      time=2.118..22177.710 rows=209955 loops=1)

       Index Cond: (((meta ->> 'birthdate'::text)
      > '2000-01-01'::text) AND ((meta ->> 'birthdate'::text)
      < '2000-12-31'::text))

     Planning time: 0.205 ms

     Execution time: 22229.615 ms

     Regard,

      Sven