On 04/30/2014 09:43 AM, Ivan Shapovalov wrote:
Hi Edward,
Hello Ivan.
So I bit the bullet, ditched all tasks for the upcoming holidays and gave
another try to the discard support in reiser4. Since my last status update,
I've lost the local kernel tree due to bad scripting (decided to TRIM free
space between partitions... and TRIM'ed everything that is NOT free space), so
I had to start from scratch.
Once in a while I send key patches to my gmail mailbox configured with
IMAP..
I decided to go with following algorithm:
During the transaction deallocated ranges are logged as-is to a list or array
(let's call it "the delete log").
"Log" is a busy term. It is already used for journal blocks.
Let's use "discard set" instead?
On atom commit we will generate a minimal
superset of the delete log, comprised only of whole erase units ("the discard
set").
So, at commit time the following actions take place:
- elements of the delete log are sorted;
- the delete log is iterated, merging any adjacent ranges;
- for each resulting range in the delete log its start is rounded down to the
closest erase unit boundary, and, starting from this block, extents of erase
unit length are created until whole range is covered;
- for each such new extent it is checked whether does it contain allocated
blocks. If no (i. e. the range is completely deallocated), the range is
added to the discard set.
Complexity of logging is thus O(n), added complexity of atom joining is O(1)
and added complexity of atom commit is O(n*log(n) + n).
The first question is how to store the delete log.
- blocknr_set isn't ordered per se, so adjacent entries can't be easily merged
- simple linked list (extent per node) seems like too big overhead
- kmalloc()'ed arrays can't be joined in O(1) and they are static (imposes a
hard limit on log size)
- kmem_cache... feels like using a sledge-hammer.
Why not to maintain discard set as extents in a special per-atom tree?
When adding an extent, try to merge it with the neighbors (if any), so
that everything is prepared for discard at any moment. Just walk through
the tree from left to right and discard all suitable extents?
There is a ready implementation of rb-trees in the kernel (see
./lib/rbtree.c,
./lib/rbtree_test.c). Could you please take a look at this?
And oh, where in commit_current_atom() should these things be done? I guess
right after the reiser4_write_logs(), under the tmgr mutex,
You are right.
Insert the function issue_discard_requests() right after the
reiser4_invalidate_list(ATOM_OVRWR_LIST(*atom)); and before
mutex_unlock(&sbinfo->tmgr.commit_mutex);
but I can't really
explain why I think so.
reiser4_write_logs() commits the atom's overwrite set, which includes
all system blocks, including bitmap blocks.
If you issue discard requests before reiser4_write_logs(), then it
can happen that the system crashes after issue_discard_requests(),
but before reiser4_write_logs(). So, we'll have that the blocks have
been discarded, while the file system still refers to them (as bitmap
eventually hasn't been updated). Welcome to data corruption..
Thanks,
Edward.
P.S.: You'll need to set up a debugging environment. Take a look at this:
http://bipinkunal.blogspot.cz/2012/05/kgdb-tutorial.html
Do you have a second machine with a serial port?
--
To unsubscribe from this list: send the line "unsubscribe reiserfs-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html