Re: Truncation of UNLOGGED tables upon restart.

Stephen Frost <sfrost@xxxxxxxxxxx> · Thu, 1 Nov 2018 20:52:23 -0400

Greetings,

* Michael Paquier (michael@xxxxxxxxxxx) wrote:
> On Thu, Nov 01, 2018 at 07:06:32PM -0400, Stephen Frost wrote:
> > No, we don't currently track that information but it's an interesting
> > idea, at least imv.
> 
> What would be the use case for it?  What you are looking for here is
> gathering information about all pages in a relation and just aggregate
> which one has the newest LSN, which you can do at SQL level using
> pageinspect to grab all the page LSNs, then use pg_control_checkpoint()
> with the LSN of the last checkpoint to know if a table has been written
> as such.  I agree that there could be cheaper solutions than that, but
> it is hard if the use cases in need of such a thing balance with the
> extra maintenance involved by a new feature when there are already tools
> allowing one to do that.

I'm not clear at all on what you're getting at here.  Yes, we could scan
the file and find the newest LSN and see if it's been changed since the
last checkpoint, but when would we do that..?  My thinking was that we'd
handle this single "is this data any good" bit pretty much identically
to how the visibility map works today- something checks and says "this
data is all good" and then if anyone touches it, the bit gets flipped
back to 'dirty'.

One thing to realize is that we'd need to hold a lock that prevents
changes to the table while we're doing this scan though, so I don't
think this would be included in VACUUM or run by autovacuum; instead
it'd need to be some new user-level command, I think.

Another idea might be to have two bits- one which is "I'm checking this
unlogged table to see if it's been changed" and the other to say "it's
all good", and then the 'check' process can scan set the first bit, scan
the relation, and if the first bit is still set then it can set the
second bit.  Any update to the relation would clear both bits.

I have to say that I really do think we should probably have some
top-level user command for this though, and as mentioned elsewhere, I
bet users would really like a way to say "don't allow further updates to
this table until I say so", to prevent a change from mistakenly being
made and a subsequent crash causing the data to be lost.  Having
unlogged tables where sometimes, if you're lucky, your data isn't lost,
but, whoops, other times you aren't lucky and it *is* lost, just doesn't
seem very appealing and definitely goes against the POLA.  Having an
explicit command would address that too.

> > Seems like a pretty useful use-case.  I had been thinking for a while,
> > based on a comment made be someone else (Vik Fearing..), that we should
> > have a way to turn an unlogged table into an 'init table' or similar-
> > that is, just copy the data from the main fork into the init fork and
> > then fsync it, then the data is there on restart.
> 
> That's interesting.  We already have all the facilities to be able to
> handle properly init forks, and the code code better in making sure that
> WAL-logging of init forks happens when it should.
> 
> > Having a way to say 'this data has been fsyncd' is a pretty interesting
> > idea though.  I wonder how hard it'd be to make that work.
> 
> That's however not much different from a "CREATE TABLE save_data AS
> SELECT * FROM unlogged_table" happening at some point in time?

Well, that's equivilant to just making the table 'logged' instead of
'unlogged', but that isn't what I think people are looking for- there's
quite a few cases where it'd be nice to just have a table that's
initialized on database crash/restart, or backup/restore, but otherwise
is unlogged.

Thanks!

Stephen
Attachment:
signature.asc

Description: PGP signature