Re: Reiser4 Upstream Git Repositories on GitHub

Edward Shishkin <edward.shishkin@xxxxxxxxx> · Tue, 4 Oct 2016 17:52:17 +0200

On 09/29/2016 05:07 PM, Edward Shishkin wrote:
[...]
BTW, your fstrim-scanner is the first candidate to scrub ;)
Actually, I think about a common multi-functional scanner, with 3
modes:
1) discard only (handle only free blocks);
2) scrub only (handle only busy blocks);
3) combined (scan the whole partition; for free blocks call
discard,
       for busy ones call scrub).
Any ideas?

Thanks,
Edward.
PS: We have an own ioctl number: 0xCD inherited from
ReiserFS(v3).
I still have to finish the erase unit detection (which has
completely
stalled) to merge all this work. Moreover:

For the fstrim, we have dropped all locking and serialization
issues
and declared that fstrim is best-effort: if it misses some blocks
due
to concurrent transactions allocating and freeing blocks, it
doesn't
matter.

For the scrub, this won't fly...
Indeed, the requirements to fstrim and scrub are different,
but, as I remember, the last decision was to not miss:
http://marc.info/?l=reiserfs-devel&m=141391883022745&w=2
so everything will fly just perfectly..

Edward.
This is different thing, it's about grabbing space in bigger chunks...
If a concurrent transaction allocates some space and frees some space,
we don't care, because it will then be discarded "online".

But in case of the scrub, how do we protect from the storage tree
changing right beneath us?

Yup, it seems that the idea of common scanner is dead.
It should be an independent tool. I think, we need to simply scan the
storage tree, do whatever is needed for each node, and make it dirty.

My last thought is that online scrub is not needed.

Global synchronization issues can not happen online. They can happen
only offline (after fsck-ing). Respectively, I suggest to move the
global synchronization stuff to user-space, where it will be extremely
simple (a sort of dd-ing partitions in parallel, plus we'll need a
user-space version of init_volume.c to collect all mirrors properly).

What can happen online is only(*) local fixable problems (when after
IO completion page is uptodate, but checksum verification failed).
There are 2 approaches:

1) Fix those local problems online: if __jparse() detects a local
   problem, then simply issue a "correction" - a write request for the
   original subvolume, and wait for its completion _before_ marking
   jnode parsed (to prevent "rollbacks").

2) In the case of local problem mark status block of the volume to
   indicate that global synchronization is required before fsck-ing.
   Then we forget about all local problems in that mount session.
   I didn't calculate the probability of simultaneous corruption of
   original and replica blocks with the same blocknumbers (don't have
   any input numbers), but I suspect that it is vanishingly small.

So, we need either pre- and post-fsck global offline synchronizations,
or global post-fsck one plus online local self-healing.

----
(*) I don't consider non-fixable IO errors (including death of one or
more mirrors) that you can handle online with block layer's RAID-1.
However, we also can implement such kind of failover in reiser4.
Downgrading arrays is simple to implement. Upgrading ones will again
require global online synchronization (scrub).

Edward.
--
To unsubscribe from this list: send the line "unsubscribe reiserfs-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html