On Monday 14 July 2014 at 03:56:43, Edward Shishkin wrote: > > On 07/13/2014 09:18 PM, Ivan Shapovalov wrote: > > On Sunday 13 July 2014 at 21:04:11, Edward Shishkin wrote: > >> On 07/13/2014 02:47 PM, Ivan Shapovalov wrote: > >>> On Sunday 13 July 2014 at 03:33:57, Edward Shishkin wrote: > >>>> On 07/09/2014 02:40 PM, Ivan Shapovalov wrote: > >>>>> On Monday 07 July 2014 at 01:47:41, Edward Shishkin wrote: > >>>>>> On 06/22/2014 12:48 PM, Ivan Shapovalov wrote: > >>>>>> [...] > >>>> [...] > >>>>>>> + * - if a single extent is smaller than the erase unit, then this particular > >>>>>>> + * extent won't be discarded even if it is surrounded by enough free blocks > >>>>>>> + * to constitute a whole erase unit; > >>>>>> Why not to discard the aligned and padded extent, which coincides > >>>>>> with the whole erase unit? > >>>>> With a number of whole erase units. > >>>>> > >>>>>>> + * - we won't be able to merge small adjacent extents forming an extent long > >>>>>>> + * enough to be discarded. > >>>>>> At this point we have already sorted and merged everything. > >>>>>> So may be it makes sense just to check the head and tail of every resulted > >>>>>> extent and discard the aligned and padded one? > >>>>> "Head and tail" is not sufficient. We may check the whole extent with a single > >>>>> bitmap request, but such algorithm will be very inefficient: it will miss many > >>>>> possibilities for discarding. > >>>>> > >>>>> Consider many-block extent, from which one block has been allocated again. > >>>>> In this case we miss (all-1) blocks to be discarded (if granularity = 1 block). > >>>>> > >>>>>> Please, consider such possibility. Iterating over erase units in > >>>>>> discard_extent() > >>>>>> looks suboptimal. > >>>>> Yes, it's costly... but I don't see any other ways to do the task efficiently. > >>>> How about this function? (the attached patch is against v6-series). > >>>> Total number of bitmap checks is in [N+1, 2N], where N is number of > >>>> extents in the list. At the same time we don't leave any non-discarded > >>>> "garbage"... > >>>> > >>>> Edward. > >>>> P.S. I tested it, but not enough. > >>> Hm. I'm probably a dumbass, but... > >>> > >>> I don't see where the [start; start+len) region is checked for being free. > >> > >> check_free_blocks() is called for this purpose. > >> > >> Firstly when checking head padding. Secondly in the gluing loop > >> (to glue 2 extents in particular means to make sure that region > >> between them is free). Finally we check if the tail padding is free. > > There are three calls to check_free_blocks(): > > > > line 197, check_free_blocks(start - headp, headp) > > This checks first extent's head padding. > > > > line 247, check_free_blocks(end, next_start - end) > > This checks blocks between end of first extent and start of second extent > > (including possible tail padding of first extent and possible head padding > > of second extent). > > > > line 284, check_free_blocks(end, tailp) > > This checks first extent's tail padding. > > > > Nothing seems to call at least check_free_blocks(begin, end)... > > > Oh, bad... I thought all this time that extents of the delete sets are > still dirty > in the working bitmap at the moment of discard. > Hmm, I definitely don't want to check the whole extents for discard... > > Why not to delay the actual deallocation (in the working bitmap)? > Anyway, in the situations of disk space pressure (on the first "soft > ENOSPC") > everyone waits for commit-everything completion. Let's think in this > direction... That's a good idea, actually. Let's outline what is needed for this: - move reiser4_post_commit_hook() after the call to discard; This will also move it outside of reiser4_write_logs(), after reiser4_post_write_back_hook() and various immediate deallocations done by the journal code. I suppose this is OK for we're under commit mutex anyway. - defer journal's immediate deallocations until discard. This is more interesting. Inside of the journal code, blocks are deallocated in four places: * dealloc_tx_list() * dealloc_wmap() -> dealloc_wmap_actor() * add_region_to_wmap() /* in error path */ * alloc_tx() /* seems like in error path, len == 0 in case of normal exit */ That is, blocks are deallocated after all meaningful work and so relative order of these deallocations seems to not matter. Given that point 1 is done, i. e. delete_set is applied to WORKING BITMAP after reiser4_write_logs(), we can simply make these deallocations deferred (BA_DEFER flag). This way, we also get rid of aux_delete_set. The only thing to check is deallocation attributes. All journal's deallocations are done with target_stage == BLOCK_NOT_COUNTED. This happily coincides with what's done in apply_dset(), so no problems here. Is this correct? Thanks, -- Ivan Shapovalov / intelfx / > > > > > > BTW, what's the purpose of headp_is_known_dirty? > > To avoid unneeded checks. > > (end + tailp > next_start) means that the head padding of the next extent > includes the region [end, next_start), which was found dirty (when trying > to glue this next extent). > > > > All uses of this variable > > can be compile-time resolved to 0. It is never read after assigned 1. > > > Yup. Its declaration with the first assignment are at wrong place. > It should be above (near the @pos). I definitely needed to work more > on this function. > > > > > >> > >>> Also, btw, do we need to cut the head (lines 155-163 of the patch) if headp > >>> is empty? It seems that it would reduce extent by one whole erase unit > >>> without any justification. > >> > >> Yes, this is definitely a leak of not discarded erase units. > >> Is the attached version better? > > Yes.. and lines 290-295, seems that tail padding handling has the same problem. > > If tailp == 0 (i. e. division remainder is 0 so that end is already aligned), > > OK, will be fixed in the same way. > > Thanks, > Edward.
Attachment:
signature.asc
Description: This is a digitally signed message part.