Hi I looked at it. Alasdair had some concerns about the interface on the phone call. From my point of view, the Fujita's interface is OK (using messages to manipulate the snapshot storage and using targets to access the snapshots). Alasdair, could you be pls. more specific about it? What I would propose to change in the upcoming redesign: - develop it as a separate target, not patch against dm-snapshot. The code reuse from dm-snapshot is minimal, and keeping the old code around will likely consume more coding time then the potential code reuse will save. - drop that limitation on maximum 64 snapshots. If we are going to redesign it, we should design it without such a limit, so that we wouldn't have to redesign it again (why we need more than 64 --- for example to take periodic snapshots every few minutes to record system activity). The limit on number of snapshots can be dropped if we index b-tree nodes by a key that contains chunk number and range of snapshot numbers where this applies. - do some cache for metadata, don't read the b-tree from the root node from disk all the time. Ideally the cache should be integrated with page cache so that it's size would tune automatically (I'm not sure if it's possible to cleanly code it, though). - the b-tree is good structure, I'd create log-structured filesystem to hold the b-tree. The advantage is that it will require less synchronization overhead in clustering. Also, log-structured filesystem will bring you crash recovery (with minimum coding overhead) and it has very good write performance. - deleting the snapshot --- this needs to walk the whole b-tree --- it is slow. Keeping another b-tree of chunks belonging to the given snapshot would be overkill. I think the best solution would be to split the device into large areas and use per-snapshot bitmap that says if the snapshot has some exceptions allocated in the pertaining area (similar to the dirty-bitmap of raid1). For short lived snapshots this will save walking the b-tree. For long-lived snapshots there is no help to speed it up... But delete performance is not that critical anyway because deleting can be done asynchronously without user waiting for it. Mikulas > This is a new implementation of dm-snapshot. > > The important design differences from the current dm-snapshot are: > > - It uses one exception store per origin device that is shared by all snapshots. > - It doesn't keep the complete exception tables in memory. > > I took the exception store code of Zumastor (http://zumastor.org/). > > Zumastor is remote replication software (a local server sends the > delta between two snapshots to a remote server, and then the remote > server applies the delta in an atomic manner. So the data on the > remote server is always consistent). > > Zumastor snapshot fulfills the above two requirements, but it is > implemented in user space. The dm kernel module sends the information > of a request to user space and the user space daemon tells the kernel > what to do. > > Zumastor user-space daemon needs to take care about replication so the > user-space approach makes sense but I think that the pure user-space > approach is an overkill just for snapshot. I prefer to implement > snapshot in kernel space (as the current dm-snapshot does). I think > that we can add features for remote replication software like Zumastor > to it, that is, features to provide user space a delta between two > snapshots and apply the delta in an atomic manner (via ioctl or > something else). > > Note that the code is still in a very early stage. There are lots of > TODO items: > > - snapshot deletion support > - writable snapshot support > - protection for unexpected events (probably journaling) > - performance improvement (handling exception cache and format, locking, etc) > - better integration with the current snapshot code > - improvement on error handling > - cleanups > - generating a delta between two snapshots > - applying a delta to in a atomic manner > > The patch against 2.6.26 is available at: > > http://www.kernel.org/pub/linux/kernel/people/tomo/dm-snap/0001-dm-snapshot-dm-snapshot-shared-exception-store.patch > > > Here's an example (/dev/sdb1 as an origin device and /dev/sdg1 as a cow device): > > - creates the set of an origin and a cow: > > flax:~# echo 0 `blockdev --getsize /dev/sdb1` snapshot-origin /dev/sdb1 /dev/sdg1 P2 16 |dmsetup create work > > - no snapshot yet: > > flax:~# dmsetup status > work: 0 125017767 snapshot-origin : no snapshot > > > - creates one snapshot (the id of the snapshot is 0): > > flax:~# dmsetup message /dev/mapper/work 0 snapshot create 0 > > > - creates one snapshot (the id of the snapshot is 1): > > flax:~# dmsetup message /dev/mapper/work 0 snapshot create 1 > > > - there are two snapshots (#0 and #1): > > flax:~# dmsetup status > work: 0 125017767 snapshot-origin 0 1 > > > - let's access to the snapshots: > > flax:~# echo 0 `blockdev --getsize /dev/sdb1` snapshot /dev/sdb1 0|dmsetup create work-snap0 > flax:~# echo 0 `blockdev --getsize /dev/sdb1` snapshot /dev/sdb1 1|dmsetup create work-snap1 > > flax:~# ls /dev/mapper/ > control work work-snap0 work-snap1 > > -- > dm-devel mailing list > dm-devel@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/dm-devel > -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel