Re: [Lsf-pc] [Cluster-devel] [LSF/MM ATTEND] [TOPIC] fs/block interface discussions

Alex Elsayed <eternaleye@xxxxxxxxx> · Fri, 12 Dec 2014 12:49:50 -0800

Alex Elsayed wrote:

> Jan Kara wrote:
> 
>>   Hi,
>> 
>> On Fri 12-12-14 11:46:34, Steven Whitehouse wrote:
>>> On 11/12/14 00:52, Alasdair G Kergon wrote:
>>> >On Wed, Dec 10, 2014 at 07:46:51PM +0100, Jan Kara wrote:
>>> >>   But still you first need to stop all writes to the filesystem, then
>>> >>   do a
>>> >>sync, and then allow writing again - which is exactly what freeze
>>> >>does.
>>> >And with device-mapper, we were asked to support the taking of
>>> >snapshots of multiple volumes simultaneously (e.g. where the
>>> >application data is stored across more than one filesystem). Thin dm
>>> >snapshots can handle this (the original non-thin ones can't).
>>> >
>>> Thats good to know, and a useful feature. One of the issues I can
>>> see is that because there are a number of different layers involved
>>> (application/fs/storage) coordination of requirements between those
>>> is not easy. To try to answer Jan's question earlier in the thread,
>>> no I don't know any specific application developers, but I can
>>> certainly help to propose some kind of solution, and then get some
>>> feedback. I think it is probably going to be easier to start with a
>>> specific proposal, albeit tentative, and then ask for feedback than
>>> to just say "how should we do this?" which is a lot more open ended.
>>> 
>>> Going back to the other point above regarding freeze, is it not
>>> necessarily a requirement to stop all writes in order to do a
>>> snapshot, what is needed is in effect a barrier between operations
>>> which should be represented in the snapshot and those which should
>>> not, because they happen "after" the snapshot has been taken. Not
>>> that I'm particularly attached to that proposal as it stands, but I
>>> hope it demonstrates the kind of thing I had in mind for discussion.
>>> I hope also that it will be possible to come up with a better
>>> solution during and/or following the discussion.
>>   I think understand your idea with a 'barrier'. It's just that I have
>> troubles seeing how it would actually get implemented - how do you make
>> sure that e.g. after writing back block allocation bitmap and while
>> writing back other metadata, noone can allocate new blocks to file 'foo'
>> and still writeback the file's inode before you submit the barrier?
> 
> Actually, I suspect something could be (relatively) trivially implemented
> using a similar strategy to dm-era. Snapshots increment the era; blocks
> from previous eras cannot be overwritten or removed, and the target could
> be mapped to view a past era. With that, you have essentially
> instantaneous snapshots (increment a counter) with only a barrier
> constraint, not freezing.

Thinking on it further, I'd suspect dm-thinp would also be fine with some 
form of barrier, rather than full freezing - generally speaking, if 
snapshots are (roughly) instantaneous, then we don't need to care so much 
about "during" the snapshot - ensuring a consistent state at one instant and 
preventing reordering across it should be sufficient.

>>> The goal  would really be to figure out which bits we already have,
>>> which bits are missing, where the problems are, what can be done
>>> better, and so forth, while we have at least two of the three layers
>>> represented and in the same room. This is very much something for
>>> the long term rather than a quick discussion followed by a few
>>> patches kind of thing, I think,
>>   Sure, if you have some proposal (not necessarily patches) then it's
>> probably worth talking about.
>> 
>> Honza
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel"
> in the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html