Search Postgresql Archives

Re: jsonb_set() strictness considered harmful to data

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Oct 18, 2019 at 09:14:09PM -0500, Ariadne Conill wrote:
Hello,

On Fri, Oct 18, 2019 at 6:52 PM Stephen Frost <sfrost@xxxxxxxxxxx> wrote:

Greetings,

* Ariadne Conill (ariadne@xxxxxxxxxxxxxxxx) wrote:
> On Fri, Oct 18, 2019 at 6:01 PM Adrian Klaver <adrian.klaver@xxxxxxxxxxx> wrote:
> > https://www.postgresql.org/docs/11/functions-json.html
> > " The field/element/path extraction operators return NULL, rather than
> > failing, if the JSON input does not have the right structure to match
> > the request; for example if no such element exists"
>
> It is known that the extraction operators return NULL.  The problem
> here is jsonb_set() returning NULL when it encounters SQL NULL.
>
> > Just trying to figure why one is worse then the other.
>
> Any time a user loses data, it is worse.  The preference for not
> having data loss is why Pleroma uses PostgreSQL as it's database of
> choice, as PostgreSQL has traditionally valued durability.  If we
> should not use PostgreSQL, just say so.

Your contention that the documented, clear, and easily addressed
behavior of a particular strict function equates to "the database system
loses data and isn't durable" is really hurting your arguments here, not
helping it.

The argument about how it's unintuitive and can cause application
developers to misuse the function (which is clearly an application bug,
but perhaps an understandable one if the function interface isn't
intuitive or is confusing) is a reasonable one and might be convincing
enough to result in a change here.

I'd suggest sticking to the latter argument when making this case.

> > > I believe that anything that can be catastrophically broken by users
> > > not following upgrade instructions precisely is a serious problem, and
> > > can lead to serious problems.  I am sure that this is not the only
> > > project using JSONB which have had users destroy their own data in
> > > such a completely preventable fashion.

Let's be clear here that the issue with the upgrade instructions was
that the user didn't follow your *application's* upgrade instructions,
and your later code wasn't written to use the function, as documented,
properly- this isn't a case of PG destroying your data.  It's fine to
contend that the interface sucks and that we should change it, but the
argument that PG is eating data because the application sent a query to
the database telling it, based on our documentation, to eat the data,
isn't appropriate.  Again, let's have a reasonable discussion here about
if it makes sense to make a change here because the interface isn't
intuitive and doesn't match what other systems do (I'm guessing it isn't
in the SQL standard either, so we unfortunately can't look to that for
help; though I'd hardly be surprised if they supported what PG does
today anyway).

Okay, I will admit that saying PG is eating data is perhaps
hyperbolic,

My experience is that using such hyperbole is pretty detrimental, even
when one is trying to make a pretty sensible case. The problem is that
people often respond in a similarly hyperbolic claims, particularly when
you hit a nerve. And that's exactly what happened here, becase we're
*extremely* sensitive about data corruption issues, so when you claim
PostgreSQL is "eating data" people are likely to jump on you, beating
you with the documentation stick. It's unfortunate, but it's also
entirely predictable.

but I will also say that the behaviour of jsonb_set()
under this type of edge case is unintuitive and frequently results in
unintended data loss.  So, while PostgreSQL is not actually eating the
data, it is putting the user in a position where they may suffer data
loss if they are not extremely careful.

Here is how other implementations handle this case:

MySQL/MariaDB:

select json_set('{"a":1,"b":2,"c":3}', '$.a', NULL) results in:
  {"a":null,"b":2,"c":3}

Microsoft SQL Server:

select json_modify('{"a":1,"b":2,"c":3}', '$.a', NULL) results in:
  {"b":2,"c":3}

Both of these outcomes make sense, given the nature of JSON objects.
I am actually more in favor of what MSSQL does however, I think that
makes the most sense of all.


I do mostly agree with this. The json[b]_set behavior seems rather
surprising, and I think I've seen a couple of cases running into exactly
this issue. I've solved that with a simple CASE, but maybe changing the
behavior would be better. That's unlikely to be back-patchable, though,
so maybe a better option is to create a non-strict wrappers. But that
does not work when the user is unaware of the behavior :-(

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux