Mike, On Tue, 2011-02-22 at 20:22 -0500, Mike Snitzer wrote: > I just had a look at the latest content and have some questions (way > more than I'd imagine you'd like to see.. means I'm clearly missing a > lot): Thanks a lot for taking the time to go through this. I'm updating the document as I answer your questions. I'll put the git commit hashes in square brackets to make it easier for you to pick out the changes for each question. > 1) from "Solution" slide: > "Space comes from a preallocated âpoolâ, which is itself just another > logical volume, thus can be resized on demand." > ... > "Separate metadata device simplifies extension, this is hidden by the > LVM system so sys admin unlikely to be aware of it." > Q: Can you elaborate on the role of the metadata? It maps between > physical "area" (allocated from pool) for all writes to the > logical address space? [0127dd9] > Q: can thinp and snapshot metadata coexist in the same pool? -- ask > similar question below. I've added a new introduction section at the start of the document that tries to explain that the thinp target is just a simple thin provisioning solution, whereas multisnap will provide both thinp and snapshots. [70e448f] > > 2) from "Block size choice" slide: > The larger the block size: > - the less chance there is of fragmentation (describe this) > Q: can you please "describe this"? :) [a6306c8] > - the less frequently we need the expensive mapping operation > Q: "expensive" is all relative, seems you contradict the expense of > the mapping operation in the "Performance" slide? [938422d] You still want to minimise it. The performance at small block sizes is better than I expected. > - the smaller the metadata tables are, so more of them can be held in core > at a time. Leading to faster access to the provisioned blocks by > minimizing reading in mapping information > Q: "more of them" -- "them" being metadata tables? So the take > away is more thinp devices available on the same host? No, fewer reads to load bit of the mapping table that aren't in the cache. [9ba3ae3] > > 3) from "Performance" slide: > "Expensive operation is mapping in a new âareaâ" > Q: is area the same as a block in the pool? Why not call block size: > "area size"? "Block size" is familiar to people? Original snapshot > had "chunk size". I switched from 'chunk' to 'block' because we seem to be the only people who use the term chunk (my fault) and I was reading lots of filesystem papers in preparation for this work where block is more ubiquitous. I've changed 'area' and 'region' to block [1c6a5352]. If you think it's still confusing I'll change everything to 'chunk' (the LVM2 tools are still going to use --chunksize etc.). > 4) Q: what did you decide to run with for reads to logical address space > that weren't previously mapped? Just return zeroes like was > discussed on lvm-team? [49c8490] I've added a 'target parameter' section [8332c43]. > The "Metadata object" section is where you lose me: I've added some more background stuff [c8e1685]. > > 5) I'm not clear on the notion of "external" vs "internal" snapshots. > Q: can you elaborate on their characteristics? See above commit. > 6) I'm not clear on how you're going to clone the metadata tree for > userspace to walk (for snapshot merge, etc). Is that "clone" really > a snapshot of the metadata device? -- seems unlikely as you'd need a > metadata device for your metadata device's snapshots? No. > - you said: "Userland will be given the location of an alternative > superblock for the metadata device. This is the root of a tree of > blocks referring to other blocks in a variety of data structures > (btrees, space maps etc.). Blocks will be shared with the âliveâ > version of the metadata, their reference counts may change as > sharing is broken, but we know the blocks will never be updated." > - Q: is this describing an "internal snapshot"? No. I don't really want to go into how the persistent-data library works. I should start a separate document for that. If you think I'm just confusing people by adding these issues then I can take this section out? > 7) from the "thin' target section: > "All devices stored within a metadata object are instanced with this > target. Be they fully mapped devices, thin provisioned devices, internal > snapshots or external snapshots." > Q: what is a fully mapped device? A thinp that's fully mapped, I'll take it out [831c136]. > > 8) "The target line: > > thin <pool object> <internal device id>" > Q: so by <pool object>, that is the _id_ of a pool object that was > returned from the 'create virtual device' message? Yep, or rather the id that was passed in to that call. Userland is in charge of allocating these numbers. > In general my understanding of all this shared store infrastructure is a > muddled. I need the audience to take away big concepts not get tripped > up (or trip me up!) on the minutia. Agreed, let's try and restrict this document to high level stuff. I'll do a separate persistent-data doc with the detail in. > > Subtle inconsistencies and/or opaque explanation aren't helping, e.g.: > 1) the detail of "Configuration/Use" for thinp volume > - "Allocate (empty) logical volume for the thin provisioning pool" > Q: how can it be "empty"? Isn't it the data volume you hand to > the pool target? Changed to 'possibly empty' [3ce2226]. I think this scenario will occur quite often, for example a VM hosting service might create a new VM for a client with a bunch of thinp devices, but not want to commit any space to the VM until the client actually starts using the devices. > - "Allocate small logical volume for the thin provisioning metadata" > Q: before in "Solution" slide you said "Separate metadata device > simplifies extension", can the metadata volume be extended too? That's the plan. A userland library will make the necc. tweaks to the metadata while the device is suspended. > - "Set up thin provisioning mapped device on aforementioned 2 LVs" > Q: so there is no distinct step for creating a pool? For the thinp target, the data device that you pass in to the target is the 'pool'. I hope the 'target parameters' section I've added helps explain this? > Q: pool is implicitly created at the time the thinp device is > created? (doubtful but how you enumerated the steps makes it > misleading/confusing). The LVM tools will implicitly create the data/backing device and the metadata device. agk is envisioning a command line like: lvcreate --target-type=thinp --chunksize=512k --low-water-mark=4 -L10G > Q: can snapshot and thinp volumes share the same pool? > (if possible I could see it being brittle?) > (but expressing such capability will help the audience "get" > the fact that the pool is nicely abstracted/sound design, > etc). I'm not sure if you're talking thinp target or multisnap here. Why 'brittle'? > p.s. I was going to hold off sending this and take another pass of your > slides but decided your feedback to all my Q:s would likely be much more > helpful than me trying to parse the slides again. You definitely did right to send these, it gives me a kick to keep improving it. Have a read through it now and see if it's any better. I'm quite happy to keep revising it for you. - Joe -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel