Re: reiser4: discard implementation, pass 2: allocation issues

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 06/15/2014 08:07 PM, Ivan Shapovalov wrote:
On Sunday 15 June 2014 at 19:36:05, Edward Shishkin wrote:	
On 06/13/2014 10:28 PM, Ivan Shapovalov wrote:
Here is my "analysis" of what happens in reiser4 during a transaction's
lifetime wrt. block allocation and deallocation.


THE EFFECTS (SEMANTICS) OF RELATED FUNCTIONS
reiser4_alloc_blocks_bitmap(): allocates in WORKING BITMAP
Yes.

reiser4_dealloc_blocks_bitmap(!BA_DEFER): deallocates from WORKING BITMAP
reiser4_dealloc_blocks_bitmap(BA_DEFER): stores to ->delete_set
This is correct for the middle-level allocator (without suffix "bitmap").
The low-level one frees blocks only in WORKING BITMAP.
Yes, this was a wording mistake. I've noticed it shortly after sending...

reiser4_pre_commit_hook_bitmap(): allocates all relocated nodes in COMMIT BITMAP
"Relocated" is bad term here: nodes with new data also get the flag
JNODE_RELOC. So, I would rather say, that it applies freshly allocated
nodes of the atom to COMMIT BITMAP.


                                    deallocates ->delete_set from COMMIT BITMAP
applies the atom's delete_set to COMMIT BITMAP
I've meant exactly this.

reiser4_post_commit_hook(): deallocates ->delete_set using !BA_DEFER
                                                     (i. e. from WORKING BITMAP)
applies the atom's delete_set to WORKING BITMAP.
ditto

I would also mention a function of the middle-level block allocator
reiser4_alloc_blocks(): allocates blocks in WORKING  BITMAP.
Yes, sure.


Note that the middle-level block allocator (block_alloc.c) actually
manipulates
with abstract space maps. Currently in reiser4 they are represented only by
bitmaps (plugin/space/bitmap.c). We can also implement another
representation -
extent tree (like in XFS). I don't see any needs for now, though.
Extent trees seem to be interesting.. They claim that they are more efficient -
is the difference that huge?

TIMELINE OF ALLOCATIONS FOR "USUAL" NODES, AND TIMELINE OF TRANSACTION COMMIT
- nodes are allocated using reiser4_alloc_blocks() and setting JNODE_RELOC,
    so WORKING BITMAP ensures that two nodes cannot get the same block;
- nodes are deallocated using reiser4_dealloc_blocks(BA_DEFER),
    so their deallocation is not immediately reflected in WORKING BITMAP;
(the relocate set is written here)
- reiser4_pre_commit_hook_bitmap() uses 1) JNODE_RELOC flag and 2) ->delete_set
    to convey effective bitmap changes into COMMIT BITMAP;
(the journal and overwrite set are written here)
- reiser4_post_commit_hook() uses ->delete_set to convey deallocations
    from step 2 to WORKING BITMAP.
(the discard happens here)


TIMELINE OF ALLOCATIONS FOR WANDERED JOURNAL BLOCKS
- at commit time, blocks are allocated using reiser4_alloc_blocks(), so they
    are allocated in WORKING BITMAP and do not interfere with any "usual" blocks;
- after writing wandered blocks, they are deallocated using
    reiser4_dealloc_blocks(!BA_DEFER), i. e. from the WORKING BITMAP.

So, the system of working and commit bitmaps plus the delete set seems
to be redundant? I think this is because of performance reasons: block
allocation is critical thing...
Seems like it is -- for performance and simplicity (== robustness).

CONCLUSION
At possible transaction replay time, journal blocks are not allocated in any
of the bitmaps. However, because the journal is read and replayed before a
transaction has a chance to commit, this fact does not matter.
What matters is that wandered journal blocks never hit COMMIT BITMAP.

So, if I've got all this correct (which I highly doubt), the disk space leak
(as you pointed it out) does not exist.
It seems, you are right..

What exists is a rather different problem with my idea of "log every
deallocated block". Current implementation logs every block regardless of
BA_DEFER flag presence or absence, so non-wandered blocks are logged twice.

We could just use ->delete_set, but we would lose wandered blocks then.
Or we could only log !BA_DEFER requests, which would do the right thing
(wandered blocks + deallocations from reiser4_post_commit_hook()), but
the reasoning behind this decision would not be obvious for a casual
code reader.
I think that a good comment will save the situation..

Or we could log only wandered blocks (in addition to ->delete_set)
at discard time, but this is messy and requires us to merge the discard log
with ->delete_set at discard time.
what is the difference with the previous "we could.."?
The previous option is to log all !BA_DEFER requests in addition to
->delete_set, so those two block sets would be partially overlapping.


Hmm.. Are they really overlapped?

reiser4_dealloc_blocks() looks like the following:
{
if (flags & BA_DEFER)
        log in the delete_set;
else
        deallocate in working bitmap
...
}

so (!BA_DEFER) requests and the delete_set look disjoint,
or, I missed something?

And this option is to log only those !BA_DEFER requests which are not in
->delete_set, so no block will be stored twice. Thus we will have to consider
both ->delete_set and this hypothetical ->discard_set at discard time.

Or we could log wandered blocks straight into ->delete_set and do something in
reiser4_post_commit_hook() to separate these entries, but this is super messy.

I'm preferring the second way... Edward, please proof-read all this.
BTW, what about your current implementation? Does it work for you?
Yes, it apparently does (no massive corruptions).

I get batches of "orphan cluster" errors from time to time (from offset 0 till
the supposed end of file),


this is fsck complaints?


  but 1) I do not see which files get corrupted,
2) this could be due to txmod=wa, 3) my machine is pretty crashy now due to
some driver issues, so I don't know what exactly causes this.

Thanks for proof-reading. What do you think about all this?


I think that not bad for the beginning! I have purchased an SSD card,
hope to test the discard stuff in the near future.

Thanks,
Edward.
--
To unsubscribe from this list: send the line "unsubscribe reiserfs-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux File System Development]     [Linux BTRFS]     [Linux NFS]     [Linux Filesystems]     [Ext4 Filesystem]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Resources]

  Powered by Linux