On Wed, Nov 11 2009 at 3:46pm -0500, Mikulas Patocka <mpatocka@xxxxxxxxxx> wrote: > > > Hi > > > > > > Please drop snapshot-linking through this "handover_snap" field, it will > > > create real or possible problems with locking or with dangling pointers > > > (i.e. are both properly locked? Which to lock first? If you destroy a > > > snapshot, you must check if something links to it, etc. --- these problems > > > could be solved, but if they can be avoided by dropping linking, it's > > > better to avoid them). > > > > I agree, lock_snapshots_for_handover() is too complicated. But by dropping > > these 'handover_snap' associations we'll lose the ability to have tight > > constraints on what handover will take place on resume of new snapshot > > (ideally a source of handover will only have one destination). > > > > The new snapshot (destination of exception handover) is not registered > > with the origin until after resume's handover_exceptions (via > > register_snapshot). And therefore it can't be found via > > find_snapshot_using_cow(). But that is a good thing as we'll know that > > if find_snapshot_using_cow does return a snapshot then it is the source > > of the handover. > > It is registered in snapshot_ctr --- that's how it was originally written. > So it should be possible to find it. > > You moved the registration away? So move it back. Why? That would be counter-productive. register_snapshot() is skipped if handover will be performed. The reason is register_snapshot() looks at the store's chunk_size. Problem is we don't have the store yet. Your handover relied on rereading the snapshot header. I no longer read_metadata() at all if the store will be handed over. > I'm somehow concerned, how this simple handover code (that I wrote 1.5 > years ago, on a train way from Berlin to Prague) is getting more and more > complicated, just for handling some "can't happen" cases. Believe me, I'd rather not be lingering over the finer points of exception handover; but once I hit the fact that snapshot_ctr() was imposing that the COW device must not be suspended it opened a can or worms (lvchange --refresh exposed this, in general it is fair to assume the cow device isn't suspended in snapshot_ctr :). The handover-based snapshot_ctr() needing to access the COW device begged the followong questions: why are we reading the snapshot header via read_metadata()? Why are we doing any IO to the COW if it will just be handed over? Answer: we don't have a real need, so don't. Not only that, but Alasdair asserted read_metadata() must not be called if another snapshot is already actively using the COW. What if something else is changing it at the same time that the new snapshot_ctr() re-reads it? Anyway, best to avoid all of that (read_metadata and register_snapshot) if we're doing exception handover. > First of all, you don't have to handle user's misbehavior --- if the user > writes dd if=/dev/zero of=/dev/hda, he loses his data --- there is no need > to protect the user from doing it. Similarly, if the user loads bad dm > table, he may lose his data too and there is little that can be done. > > You can add protection from double target load as a bonus --- but do it > only as long as it doesn't complicate the code. If this protection from > double load would complicate the code too much, then don't do it. I just posted the latest version of the handover patch (v7); it no longer has 'handover_snap' pointers. As such the locking is simplified. It also handles protecting against double target loads. > Last night, when talking with Alasdair, we found out that the "active" > variable could be used as a token --- there could be at most one active > snapshot and on handover, you would set "active=0" on the old snapshot and > "active=1" on the new snapshot, always enforcing that there is just one > snapshot active. Right, find_snapshot_using_cow() in v7 of the patch does make use of active to know to overlook a snapshot on the origin's 'snapshots' list that are not active (snapshot may be left on the list after handover, snapshot_dtr() hasn't come through yet, but it is both invalid and inactive). v7 handover_exceptions() sets: snap_src->active = 0; > But generally: > 1. make handover work the simplest way without any protection (like I > wrote it). > 2. add the protection from double load only if it doesn't complicate the > code too much (for example, checking that there is just one "active" > snapshot should be relatively easy). I think v7 is fairly robust; I've tested it pretty extensively both standalone and with the rest of the snapshot-merge patches. As always the full snapshot-merge DM patchset is available here: http://people.redhat.com/msnitzer/patches/snapshot-merge/kernel/2.6.32/ -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel