Re: Introducing Next3 - built-in snapshots support for Ext3

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, May 9, 2010 at 1:56 PM, Amir G. <amir73il@xxxxxxxxxxxxxxxxxxxxx> wrote:
> On Sun, May 9, 2010 at 5:25 AM, Theodore Tso  wrote:
>>
>> For example, if you have flags that only are used in-memory, they shouldn't be allocated out of i_flags, but instead using the i_state field in the ext4_inode_info structure, referenced via EXT4_I(inode).i_state.   That's what it is there fore.
>>
>
> I have 4 non-persistent flags. I will move them to i_state.
> I've kept them in i_flags out of laziness, since I use lsattr -X
> to read non-persistent snapshot flags along with persistent snapshot flags.
>
>>
>> For example: i_snapshot_blocks_count.  Is that really necessary?   Why can not be computed by looking at i_size of the snapshot inode?   Or by checking to see if the superblock has be COW'ed?  If it hasn't then the s_blocks_count in the fs superblock must be what would have been in i_snapshot_blocks_count.  If the sb has been COW'ed, s_blocks_count in the COW'ed sb must be that value.  Why allocate and waste a full 32-bit field out of _every_ inode in the file system if it's possible to get that value via other places.
>>
>
> very well, I can read snapshot_blocks_count from COWed superblock (it
> is always COWed on snapshot take) and release i_snapshot_blocks_count.
>
>> I have similar question about bg_cow_bitmap.  Is that really necessary?   The COW bitmap is just a copy of the base file system's block allocation bitmap, right?  (The wiki documentation and the design PDF isn't completely clear on that point, but that seems to be what it is.)   So why do you need to allocate a field out of the bg discriptor field for it?
>>
>> It's not clear why you need an exclude inode, if you are also storing the address of the exclude bitmap blocks in the bg descriptor.  One or the other, but not both...
>>
>
> bg_cow_bitmap/bg_exculde_bitmap are used by Next3 as non-persistent
> cache for the address of a bitmap blocks,
> which can be read from active_snapshot/exclude_inode.
> in other words, instead of allocating per group in-memory structure, I
> used the 2 unused fields in the in-memory group descriptor.
> the only side effect for the ext* on-disk format is that those fields
> are no longer 0 after mounting a volume with Next3.
> is that a problem? can the CSUM feature resolve that problem?
>
> in e2fsprogs, I only reference those fields for debugging purpose
> (dumpe2fs displays them).
> also create_exclude_inode and resize2fs set the bg_exclude_bitmap, but
> they don't have to,
> because Next3 re-reads all exclude_bitmap block addresses from exclude
> inode on mount time.
> so please feel free to reject those field assignments. I can include
> them in a seperate debug only patch.
>
>
>
>> Also, as far as i_next_snapshot is concerned, why not just use d_time for the linked list.  That's what we do with the orphaned inode list, so we have code that maintains a linked list using d_time already.   So that way you don't need _any_ new inode fields.
>>
>
> I don't know if you noticed, but I reused the code of
> add/del_orphan_list() to manipulate the snapshot list...
> And as I said a few lines back, I can revert to using i_dtime instead
> of i_next_snapshot.
>
>> I'm not convinced by all of the fs feature compatibility flags you've defined, either.   In general, if you suspect one part of the file system, you need to check everything.  So do you really need separate ro_compat bits for "fix_snapshot" and "fix_exclude"?
>
> no, not really. these are informational only. I didn't even use
> fix_snapshot yet.
>
>> And why do you need "IS_SNAPSHOT"?
>>
>
> I "fix" the COWed superblock to make it look like ext2 superblock and
> set the is_snapshot feature, so fsck would know it is checking a Next3
> snapshot image (and not report wrong block counts).
>
>> As far as COMPAT_BIG_JOURNAL, we have feature flags in the journal, and that probably is better placed there.
>
> I will look into that.
>
>>And I assume COMPAT_EXCLUDE_INODE is really "COMPAT_HAS_SNAPSHOTS"?
>
> logically, it means that the exclude inode/bitmap is allocated.
> currently, the only feature that uses the exclude bitmap is the
> snapshot feature,
> so I don't mind bundling them together.
>
> but I do recommend to mke2fs -o exclude_inode if you intend to switch
> from Ext3 to Next3 at some point.
> it will guarenty that exclude bitmap blocks are allocated their
> corresponding block bitmap.
>

I have started making some changes to on-disk format based on the
points we seem to agree upon.

I would like to register only 1 ro_compat feature (has_snapshot) and 1
compat feature (exclude_inode).
the rest of the "informational features" I would like to move to
s_flags, including NEXT3_FLAGS_BIG_JOURNAL.

A Next3 big journal can be created with the option -J big or by
mkfs.next3/mkn3fs.

I will evacuate all the fields that Next3 can do without (i.e.,
{s_snapshot_blocks_count,bg_{cow,exclude}_bitmap}).

I will move the non-persistent snapshot flags to i_state.

I would like to stick with i_version list chaining and not revert to
using i_dtime. awaiting further discussion on that topic.

Awaiting permanent assignments for the rest of the fields/flags.

Per your request, I have added the information above to the Next3 wiki.
Please find the TODO items in red and WIP items in green (implemented
and not published):
http://sourceforge.net/apps/mediawiki/next3/index.php?title=Code_documentation#Reserved_fields_and_bits_in_on-disk_structures

Also, if you could please drop a line about your view of how to
progress with Next3
(something in the lines of what you said in the conference call), that
would be nice.
Some people may have gotten the impression that the fork from Ext3 is
a show stopper for you,
see: http://lwn.net/SubscriberLink/387231/1310b1360769c12b/

Thanks,
Amir.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux