Hi Alan,
Commercial tools promise this ability. How do they get the block-to-file
mapping to do the restore? I was looking for a way to do that so I could
do the same using LVM snapshots.
you cannot go block to file. To start with when restoring the block may
already have been reused for another file.
I suppose they use something like inotify (or their own virtual file
system driver over a real file system, like NFS or a loop fs) to learn
about changed blocks, but they find to which file each block belongs to
and salve this info in their backup catalog. If the changed block is
filesystem (or md device, or lvm) metadata, they have to understand this
and eithert log the change apropriately or ignore it as it's not file data.
I can imagine something like this working and even how to program. And
I'm a little scared about some backup tool being monitoring my file
accesses all the time. ;-)
So I won't find anything similar from open source tools, not even a
kernel API to help me if I want to implement myself?
You can go file to block list, but thats only for some file systems and
not really reliable except for an unmounted snapshot.
As far as the goal is to capture the data, I can't see why it couldn't
be made in a realiable way. I'm not saying it would be trivial. But all
file changes have to go though the kernel, even if they are kept in
memory before going to the disk, so it should be possible for a daemon
to be notified about all changes and get the data. It's just a matter of
having a kernel API. I suppose inotify would be it.
But LVM snapshots are a "whole" disk. If I try to backup them using dd
or rsync, they are the same as a full backup. How to backup just the
snapshot changed blocks and later restore them (of course after
restoring the full volume, or to a mirror)?
What the snapshot gives you is an atomic copy of the file system so you
can do a full file system copy, or backup the snapshot without the stuff
underneath changing. It's basically a way to get an unmounted, out of use
copy cheaply that you can then use for stuff.
No questions about this. I want to move further. Doing a dump or a rsync
from a snapshot of a multiple TB filesystem is the same as doing to the
original volume. I want to devise a way to do this in a faster way
without sacrificing realiability.
Correct - the only way to check any copy is valid is by comparing the
original to the copy. That in fact (plus clever magic) is how rsync
works, so in effect the way to check if an rsync copy is valid is to
try and rsync it again. Doing a set of sha or md5sums on the two sides
and comparing the output now and then ought to provide a further check.
More time spent in what's already too slow. There could be a rsync or
drdb tool that calculates, stores and sends hashes on-the-fly, so the
remote copy could be checked per se.
There has to be a better way to restore a few TB of backup consisting of
lots of small files. :-(
Is the issue backing up or restoring ?
The main issue is backing up every day, even many times a day. But for
me there's no value in a speedy backup which I cannot restore reliably,
not just from the computer standpoint. Someone (people) has to find
which backup sets are needed to do the restore. They need to be able to
check these backup sets before or during the restore.
If it is backing up then it may be
possible to work out which blocks are different between two snapshots and
transfer just those.
How? Anyone on the list can provide hints?
I don't know the innards of the LVM layer well
enough to know if there is a clever way to do that. I'm also not sure it
would help if the blocks are scattered about as it would still be a lot
of seeking.
That "clever way" seems to be what commercial tools promise, but they
don't tell me what they use: which kernel API, their own driver, or if
they work only this or that network storage... :-( I don't trust
anything I can't understand how it works. All "magical" solutions I
found previoulsy proved to be no solution at all.
I'm seeing the file three walk is taking too long, just to find that
most files weren't changed, even relying on last modification time, that
if I could get a list of blocks to back up it should be faster (less
disk seeks).
It shouldn't be too hard to implement a deamon using inotify and some
queueing strategy to deal with changed file blocks, add metadara, then
compress and send elsewhere. On the same machine, if I read the changed
block I should get it's correct data, even if they weren't synced yet.
But I can't find anoyne who did as open source, so maybe threre's some
problem I could not see yet. And I'd take too long to implement and
debug myself alone. Any developers out there seeking for alfa testers
for their new, revolutionary, backup tool ;-)
[]s, Fernando Lozano
--
users mailing list
users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
Have a question? Ask away: http://ask.fedoraproject.org