Thanks Adrian for all the help. I filed this as bug #15549. I hope this all helps get logical replication into the "Running" stage.
On Wed, Dec 12, 2018 at 5:06 PM Adrian Klaver <adrian.klaver@xxxxxxxxxxx> wrote:
On 12/12/18 3:19 PM, Mike Lissner wrote:
> This sounds *very* plausible. So I think there are a few takeaways:
>
> 1. Should the docs mention that additive changes with NOT NULL
> constraints are bad?
It's not the NOT NULL it's the lack of a DEFAULT. In general a column
with a NOT NULL and no DEFAULT is going to to bite you sooner or later:)
At this point I have gathered enough of those bite marks to just make it
my policy to always provide a DEFAULT for a NOT NULL column.
>
> 2. Is there a way this could work without completely breaking
> replication? For example, should Postgresql realize replication can't
> work in this instance and then stop it until schemas are back in sync,
> like it does with other incompatible schema changes? That'd be better
> than failing in this way and is what I'd expect to happen.
Not sure as there is no requirement that a column has a specified
DEFAULT. This is unlike PK and FK constraint violations where the
relationship is spelled out. Trying to parse all the possible ways a
user could get into trouble would require something on the order of an
AI and I don't see that happening anytime soon.
>
> 3. Are there other edge cases like this that aren't well documented that
> we can expect to creep up on us? If so, should we try to spell out
> exactly *which* additive changes *are* OK?
Not that I know of. By their nature edge cases are rare and often are
dealt with in the moment and not pushed out to everybody. The only
solution I know of is pretesting your schema change/replication setup on
a dev installation.
>
> This feels like a major "gotcha" to me, and I'm trying to avoid those. I
> feel like the docs are pretty lacking here and that others will find
> themselves in similarly bad positions.
Logical replication in core(not the pglogical extension) appeared for
the first time in version 10. On the crawl/walk/run spectrum it is
moving from crawl to walk. The docs will take some time to be more
complete. Just for the record my previous post was sketching out a
possible scenario not an ironclad answer. If you think the answer is
plausible and a 'gotcha' I would file a bug:
https://www.postgresql.org/account/login/?next=/account/submitbug/
>
> Better schema migration docs would surely help, too.
>
> Mike
>
>
--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx