Re: Moving existing internal journal log to an external device (success?)

Dave Chinner <david@xxxxxxxxxxxxx> · Mon, 21 Aug 2023 08:14:37 +1000

On Sun, Aug 20, 2023 at 03:37:38PM -0400, fk1xdcio@xxxxxxxx wrote:
> Does this look like a sane method for moving an existing internal log to an
> external device?
> 
> 3 drives:
>    /dev/nvme0n1p1  2GB  Journal mirror 0
>    /dev/nvme1n1p1  2GB  Journal mirror 1
>    /dev/sda1       16TB XFS
> 
> # mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/nvme0n1p1
> /dev/nvme1n1p2
> # mkfs.xfs /dev/sda1
> # xfs_logprint -C journal.bin /dev/sda1
> # cat journal.bin > /dev/md0
> # xfs_db -x /dev/sda1
> 
> xfs_db> sb
> xfs_db> write -d logstart 0
> xfs_db> quit
> 
> # mount -o logdev=/dev/md0 /dev/sda1 /mnt

So you are physically moving the contents of the log whilst the
filesystem is unmounted and unchanging.

> -------------------------
> 
> It seems to "work" and I tested with a whole bunch of data.

You'll get ENOSPC earlier than you think, because you just leaked
the old log space (needs to be marked free space). There might be
other issues, but you get to keep all the broken bits to yourself if
you find them.

You can probably fix that by running xfs_repair, but then....

> I was also able
> to move the log back to internal without issue (set logstart back to what it
> was originally). I don't know enough about how the filesystem layout works
> to know if this will eventually break.

.... this won't work.

i.e. you can move the log back to the original position because you
didn't mark the space the old journal used as free, so the filesytem
still thinks it is in use by something....

> *IF* this works, why can't xfs_growfs do it?

"Doctor, I can perform an amputation with a tornique and a chainsaw,
why can't you do that?"

Mostly you are ignoring the fact that growfs in an online operation
- actually moving the log safely and testing it rigorously is a
whole lot harder to than changing a few fields with xfs_db....

Let's ignore the simple fact we can't tell the kernel to use a
different block device for the log via growfs right now (i.e. needs
a new ioctl interface) and focus on what is involved in moving the
log whilst the filesytem is mounted and actively in use.

First we need an atomic, crash safe mechanism to swap from one log
to another. We need to do that while the filesystem is running, so
it has to be done within a freeze context. Then we have run a
transaction that initialises the new log and tells the old log where
the new log is so that if we crash before the superblock is written
log recovery will replay the log switch. Then we do a sync write of
the superblock so that the next mount will see the new log location.
Then, while the filesystem is still frozen, we have to reconfigure
the in memory log structures to use the new log (e.g. open new
buftarg, update mount pointers to the log device, change the log
state to external, reset log sequence numbers, grant heads, etc).

Finally, if we got everything correct, we then need to free the old
journal in a new transaction running in the new log to clean up the
old journal now that it is no longer in use. Then we can unfreeze
the filesystem...

Yes, you can do amputations with a chainsaw, but it's a method of
last resort that does not guarantee success and you take
responsibility for the results yourself. Turning this into a
reliable procedure that always works or fails safe for all
conditions (professional engineering!) is a whole lot more
complex...

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx