Interpolation problem - pg 12.4 - full correct version!

Pól Ua Laoínecháin <linehanp@xxxxxx> · Wed, 19 Aug 2020 19:50:15 +0100

Hi all - here is the entire correct problem - apologies again!

Hi all,

I have an interpolation problem as follows - fiddle available here:

https://dbfiddle.uk/?rdbms=postgres_12&fiddle=428aa76d49b37961088d3dfef190757f

A table:

CREATE TABLE data
(
  s SERIAL PRIMARY KEY,
  t TIMESTAMP,
  lat NUMERIC
);

and data:

INSERT INTO data (t, lat)
VALUES
('2019-01-01 00:00:00', 5.07),
('2019-01-01 01:00:00', 4.60),
('2019-01-01 02:00:00', NULL),
('2019-01-01 03:00:00', NULL),
('2019-01-01 04:00:00', 4.7),
('2019-01-01 05:00:00', 4.20),
('2019-01-01 06:00:00', NULL),
('2019-01-01 07:00:00', 4.98),
('2019-01-01 08:00:00', 4.50),
('2019-01-01 09:00:00', 4.7),
('2019-01-01 10:00:00', NULL),
('2019-01-01 11:00:00', NULL),
('2019-01-01 12:00:00', NULL),
('2019-01-01 13:00:00', 6.45),
('2019-01-01 14:00:00', 3.50);

There are gaps in the data as you can see - I'm trying to fill them
using the algorithm:

 - a sequence of 1 NULL - take the average of the reading above and
the reading below

- a sequence of 2 NULLs - the top assigned value is the average of the
two records above it and the bottom assigned one is the average of the
two records below.

So far, so good - I'm able to do this (but see discussion below)

 - a sequence of 3 NULLs - the middle one is assigned a value equal to
average of the non-NULL record above and the non-null record below,
and then the remaining NULLs above and below the average of the middle
one and the non-NULL ones above and below.

This is where it gets tricky - I'm getting answers, but I don't think
they're correct. The result of the massive  SQL shown below are here
(also on fiddle):

s    lat     final_val
1    5.07         5.07
2    4.60         4.60
3    NULL         4.84
4    NULL         4.45
5     4.7          4.7
6    4.20         4.20
7    NULL         4.59
8    4.98         4.98
9    4.50         4.50
10    4.7          4.7
11   NULL         4.60
12   NULL         5.58
13   NULL         4.98
14   6.45         6.45
15   3.50         3.50

The value for record 12 is correct, ,but not those above and below it.

I think my *MAJOR* problem is that I've developed what is,
essentially, a totally brute force approach - and this simply won't
work at the scenario becomes more complex - take a look at the CASE
statement - it's horrible and would only become exponentially worse as
the number NULLs rises.

So, my question is: Is there a recognised technique (using SQL only,
not PL/pgSQL - soutions based on the latter are easy to find) whereby
I can do a basic Linear Interpolation?

Should you require any further information, please don't hesitate to contact me.

TIA and rgs,

Pól...

=========================================================

My mega SQL:

WITH cte1 AS
(
  SELECT d1.s,
    d1.t AS t1, d1.lat AS l1,
    LAG(d1.lat, 2)  OVER (ORDER BY t ASC) AS lag_t1_2,
    LAG(d1.lat, 1)  OVER (ORDER BY t ASC) AS lag_t1,
    LEAD(d1.lat, 1) OVER (ORDER BY t ASC) AS lead_t1,
    LEAD(d1.lat, 2) OVER (ORDER BY t ASC) AS lead_t1_2
  FROM data d1
),
cte2 AS
(
  SELECT
    d2.t AS t2, d2.lat AS l2,
    LAG(d2.lat, 1) OVER(ORDER BY t DESC) AS lag_t2,
    LEAD(d2.lat, 1) OVER(ORDER BY t DESC) AS lead_t2
  FROM data d2
),
cte3 AS
(
  SELECT t1.s,
    t1.t1,  t1.lag_t1_2, t1.lag_t1, t2.lag_t2, t1.l1, t2.l2,
    t1.lead_t1, t2.lead_t2, t1.lead_t1_2
  FROM cte1 t1
  JOIN cte2 t2
  ON t1.t1 = t2.t2
),
cte4 AS
(
  SELECT t1.s,
  t1.l1 AS lat,
    CASE

      -- The WHEN for the middle of 3 NULLs has to be at the beginning
      -- of the CASE - if at the end, it remains NULL - why?

      WHEN (t1.lag_t1 IS NULL) AND (t1.lag_t2 IS NULL) AND (t1.l1 IS NULL)
        AND (t1.lead_t1 IS NULL) AND (t1.lead_t2 IS NULL)
      THEN ROUND((t1.lag_t1_2 + t1.lead_t1_2)/2, 2)

      WHEN (t1.l1 IS NOT NULL) THEN t1.l1
      WHEN (t1.l1 IS NULL) AND (t1.l2) IS NULL AND (t1.lag_t1 IS NOT NULL)
        AND (t1.lag_t2 IS NOT NULL) THEN ROUND((t1.lag_t1 + t1.lag_t2)/2, 2)
      WHEN (t1.lag_t2 IS NULL) AND (t1.l1 IS NULL) AND (t1.l2 IS NULL)
        AND (t1.lead_t1 IS NULL) THEN ROUND((t1.lag_t1 + t1.lag_t1_2)/2, 2)
      WHEN (t1.l1 IS NULL) AND (t1.l2 IS NULL)  AND (t1.lag_t1 IS NULL)
        AND (t1.lead_t2 IS NULL) THEN ROUND((t1.lead_t1 + t1.lead_t1_2)/2, 2)
      ELSE 0
    END AS final_val
  FROM cte3 t1
)
SELECT s, lat, final_val FROM cte4;