PgOutput Replication Message Format - Differentiate between explicit NULL and Omitted Columns during Insert

Pranav Hegde <pranavh4@xxxxxxxxx> · Wed, 5 Apr 2023 14:39:28 +0530

Currently in the PgOutput messages for Insert statements, there is no way to differentiate if the column was explicitly set to null in the insert statement, or whether that column was omitted in the insert statement and thus set to null (or default value)

For example, consider the below table:
CREATE TABLE test_table (
    id int primary key,
    text_col_1 text default 'default_text_1',
    text_col_2 text default 'default_text_2'
);

Now if we insert a row using:
INSERT INTO test_table (id, text_col_1) VALUES (1, null);

In the above query I am explicitly setting text_col_1 to null, and I am omitting text_col_2 which will be populated with the default value.
The PgOutput replication message classifies both text_col_1 and text_col_2 values as NULL. (Refer Logical Replication Message Format)
Thus we are not able to identify which column was explicitly set to null and which column was set to the default value.
This causes issues when we are parsing the message, especially in libraries such as Debezium, where it returns the default value for the column, even though in the actual SQL query we set that column to be explicitly null. Thus if there was a way to differentiate between the two cases in the replication message itself, then the appropriate action can be taken in downstream pipelines (set to null or set to default value)

Was wondering if there is a way to overcome this issue in the current pgoutput plugin or the code needs to be changed to account for this case. Any help would be appreciated