On Wed, 6 Aug 2008 15:14:50 -0400 (EDT) Mikulas Patocka <mpatocka@xxxxxxxxxx> wrote: > Hi > > I looked at it. Thanks! I didn't expect someone read the patch. I'll submit patches in more proper manner next time. > Alasdair had some concerns about the interface on the phone call. From my > point of view, the Fujita's interface is OK (using messages to manipulate > the snapshot storage and using targets to access the snapshots). Alasdair, > could you be pls. more specific about it? Yeah, we can't use dmsetup create/destroy to create/delete snapshots. We need something different. I have no strong opinion about it. Whatever interface is fine by me as long as it works. > What I would propose to change in the upcoming redesign: > > - develop it as a separate target, not patch against dm-snapshot. The code > reuse from dm-snapshot is minimal, and keeping the old code around will > likely consume more coding time then the potential code reuse will save. It's fine by me if the maintainer prefers it. Alasdair? > - drop that limitation on maximum 64 snapshots. If we are going to > redesign it, we should design it without such a limit, so that we wouldn't > have to redesign it again (why we need more than 64 --- for example to > take periodic snapshots every few minutes to record system activity). The > limit on number of snapshots can be dropped if we index b-tree nodes by a > key that contains chunk number and range of snapshot numbers where this > applies. Unfortunately it's the limitation of the current b-tree format. As far as I know, there is no code that we can use, which supports unlimited and writable snapshot. > - do some cache for metadata, don't read the b-tree from the root node > from disk all the time. The current code already does. > Ideally the cache should be integrated with page > cache so that it's size would tune automatically (I'm not sure if it's > possible to cleanly code it, though). Agreed. The current code invents the own cache code. I don't like it but there is no other option. > - the b-tree is good structure, I'd create log-structured filesystem to > hold the b-tree. The advantage is that it will require less > synchronization overhead in clustering. Also, log-structured filesystem > will bring you crash recovery (with minimum coding overhead) and it has > very good write performance. A log-structured filesystem is pretty complex. Even though we don't need a complete log-structured filesystem, it's still too complex, IMO. A copy-on-Write manner to update the b-tree on disk (as some of the latest file systems do) is a possible option. Another option is using journaling as I wrote. > - deleting the snapshot --- this needs to walk the whole b-tree --- it is > slow. Keeping another b-tree of chunks belonging to the given snapshot > would be overkill. I think the best solution would be to split the device > into large areas and use per-snapshot bitmap that says if the snapshot has > some exceptions allocated in the pertaining area (similar to the > dirty-bitmap of raid1). For short lived snapshots this will save walking > the b-tree. For long-lived snapshots there is no help to speed it up... > But delete performance is not that critical anyway because deleting can be > done asynchronously without user waiting for it. Yeah, it would be nice to delete a snapshot really quickly but it's not a must. -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel