HI all, Sorry - posted the wrong URL for my problem - Doh... Here is the correct one! https://dbfiddle.uk/?rdbms=postgres_12&fiddle=428aa76d49b37961088d3dfef190757f Again, apologies and rgs, Pól... On Wed, 19 Aug 2020 at 19:16, Pól Ua Laoínecháin <linehanp@xxxxxx> wrote: > > Hi all, > > I have an interpolation problem as follows - fiddle available here: > > https://dbfiddle.uk/?rdbms=postgres_12&fiddle=8d23925146ea11a904c454709b0026fd > > A table: > > CREATE TABLE data > ( > s SERIAL PRIMARY KEY, > t TIMESTAMP, > lat NUMERIC > ); > > and data: > > > INSERT INTO data (t, lat) > VALUES > ('2019-01-01 00:00:00', 5.07), > ('2019-01-01 01:00:00', 4.60), > ('2019-01-01 02:00:00', NULL), > ('2019-01-01 03:00:00', NULL), > ('2019-01-01 04:00:00', 4.7), > ('2019-01-01 05:00:00', 4.20), > ('2019-01-01 06:00:00', NULL), > ('2019-01-01 07:00:00', 4.98), > ('2019-01-01 08:00:00', 4.50), > ('2019-01-01 09:00:00', 4.7), > ('2019-01-01 10:00:00', NULL), > ('2019-01-01 11:00:00', NULL), > ('2019-01-01 12:00:00', NULL), > ('2019-01-01 13:00:00', 6.45), > ('2019-01-01 14:00:00', 3.50); > > > There are gaps in the data as you can see - I'm trying to fill them > using the algorithm: > > - a sequence of 1 NULL - take the average of the reading above and > the reading below > > - a sequence of 2 NULLs - the top assigned value is the average of the > two records above it and the bottom assigned one is the average of the > two records below. > > So far, so good - I'm able to do this (but see discussion below) > > - a sequence of 3 NULLs - the middle one is assigned a value equal to > average of the non-NULL record above and the non-null record below, > and then the remaining NULLs above and below the average of the middle > one and the non-NULL ones above and below. > > This is where it gets tricky - I'm getting answers, but I don't think > they're correct. The result of the massive SQL shown below are here > (also on fiddle): > > s lat final_val > 1 5.07 5.07 > 2 4.60 4.60 > 3 NULL 4.84 > 4 NULL 4.45 > 5 4.7 4.7 > 6 4.20 4.20 > 7 NULL 4.59 > 8 4.98 4.98 > 9 4.50 4.50 > 10 4.7 4.7 > 11 NULL 4.60 > 12 NULL 5.58 > 13 NULL 4.98 > 14 6.45 6.45 > 15 3.50 3.50 > > The value for record 12 is correct, ,but not those above and below it. > > I think my *MAJOR* problem is that I've developed what is, > essentially, a totally brute force approach - and this simply won't > work at the scenario becomes more complex - take a look at the CASE > statement - it's horrible and would only become exponentially worse as > the number NULLs rises. > > So, my question is: Is there a recognised technique (using SQL only, > not PL/pgSQL - soutions based on the latter are easy to find) whereby > I can do a basic Linear Interpolation? > > Should you require any further information, please don't hesitate to contact me. > > TIA and rgs, > > > Pól... > > > ========================================================= > > My mega SQL: > > WITH cte1 AS > ( > SELECT d1.s, > d1.t AS t1, d1.lat AS l1, > LAG(d1.lat, 2) OVER (ORDER BY t ASC) AS lag_t1_2, > LAG(d1.lat, 1) OVER (ORDER BY t ASC) AS lag_t1, > LEAD(d1.lat, 1) OVER (ORDER BY t ASC) AS lead_t1, > LEAD(d1.lat, 2) OVER (ORDER BY t ASC) AS lead_t1_2 > FROM data d1 > ), > cte2 AS > ( > SELECT > d2.t AS t2, d2.lat AS l2, > LAG(d2.lat, 1) OVER(ORDER BY t DESC) AS lag_t2, > LEAD(d2.lat, 1) OVER(ORDER BY t DESC) AS lead_t2 > FROM data d2 > ), > cte3 AS > ( > SELECT t1.s, > t1.t1, t1.lag_t1_2, t1.lag_t1, t2.lag_t2, t1.l1, t2.l2, > t1.lead_t1, t2.lead_t2, t1.lead_t1_2 > FROM cte1 t1 > JOIN cte2 t2 > ON t1.t1 = t2.t2 > ), > cte4 AS > ( > SELECT t1.s, > t1.l1 AS lat, > CASE > > -- The WHEN for the middle of 3 NULLs has to be at the beginning > -- of the CASE - if at the end, it remains NULL - why? > > WHEN (t1.lag_t1 IS NULL) AND (t1.lag_t2 IS NULL) AND (t1.l1 IS NULL) > AND (t1.lead_t1 IS NULL) AND (t1.lead_t2 IS NULL) > THEN ROUND((t1.lag_t1_2 + t1.lead_t1_2)/2, 2) > > WHEN (t1.l1 IS NOT NULL) THEN t1.l1 > WHEN (t1.l1 IS NULL) AND (t1.l2) IS NULL AND (t1.lag_t1 IS NOT NULL) > AND (t1.lag_t2 IS NOT NULL) THEN ROUND((t1.lag_t1 + t1.lag_t2)/2, 2) > WHEN (t1.lag_t2 IS NULL) AND (t1.l1 IS NULL) AND (t1.l2 IS NULL) > AND (t1.lead_t1 IS NULL) THEN ROUND((t1.lag_t1 + t1.lag_t1_2)/2, 2) > WHEN (t1.l1 IS NULL) AND (t1.l2 IS NULL) AND (t1.lag_t1 IS NULL) > AND (t1.lead_t2 IS NULL) THEN ROUND((t1.lead_t1 + t1.lead_t1_2)/2, 2) > ELSE 0 > END AS final_val > FROM cte3 t1 > ) > SELECT s, lat, final_val FROM cte4;