On 2014-01-28 10:25, Vyacheslav Dubeyko wrote: > Hi Ryusuke, > > This is my improved vision of possible approach to change in-place > update of superblock on COW policy. I suppose that current description > includes all that we discussed previously. And we can continue to deepen > this discussion. > > Approach is based on necessity to have two areas at the begin and > at the end of a NILFS2 volume. Every such area should have capacity > is equal to segment size. The goal of these two areas is to provide > a FTL-friendly way of storing information about latest log and > modified superblock's fields by means of COW (Copy-On-Write) policy. > > At the begin of a NILFS2 volume is located primary superblock area. > Primary superblock area begins from static superblock is created > during NILFS2 volume creation by means of mkfs tool. This superblock > (primary superblock) is located on 1024 bytes from the volume begin > (as it placed currently). The primary superblock leaves untouched > during filling primary superblock area by modified information. > Initial state of superblock can be rewritten at the moment of > beginning next iteration of filling of primary superblock area > (because this area lives likewise circular buffer). > > ------------------------------------------------------------ > | Primary superblock | Modifiable area | > ------------------------------------------------------------ > |<---- 4 KB ------>| > |<-------------------- segment size ---------------------->| > > On the opposite side of the volume (at the volume's end) is located > secondary (or back) superblock area. This area begins from modifiable > area and it ends with secondary superblock (as it is located currently). > Modifiable area of secondary superblock area lives likewise of > modifiable area in first superblock area. > > ------------------------------------------------------------ > | Modifiable area | Secondary superblock | > ------------------------------------------------------------ > |<------ 4 KB ------>| > |<-------------------- segment size ---------------------->| > > Primary and secondary superblock areas have goal to keep copies > of super roots. And, firstly, namely these areas are used for > searching a latest log. These areas should keep as super root as > physical block of this super root's placement. Moreover, primary > and secondary superblock areas have different frequency of updating. > Secondary superblock area is updated during every umount or once > at several hours (if we have significant system uptime). Primary > superblock area is updated more frequently. The frequency of > primary superblock area's update can be based on timeout or count > of constructed segments. But, anyway, it makes sense to take into > account only full segments instead of partial segments. Maybe, it > makes sense to keep more complex combination in modifiable area: > super root + diff to superblock state + physical block of super root's > placement. > > Modifiable area should have special filling policy. This policy > doesn't contradict with COW policy but it implements not in > sequential manner. Namely, modifiable area should be divided on > several groups (the count of groups can be configurable option). > Moreover, primary and secondary superblock areas would have > different values of groups count. Thereby, every group will contain > some blocks count. > > ------------------------------------------------------------- > | Group1 | Group2 | Group3 | **** | GroupN | > ------------------------------------------------------------- > |<-------------------- Modifiable area -------------------->| > > Saving blocks are distributed between groups by means of policy > that every next block should be saved in next group on every > iteration. If all groups in modifiable area have equal count of > saved blocks then it begins the next iteration which starts from > the first group. > > FIRST ITERATION [A phase]: > > (1) first block > ------------------------------------------------------------- > |A1| | | | | | | | | | | | | | | | | | | | > ------------------------------------------------------------- > |<-- Group1 -->|<-- Group2 -->|<-- ****** -->|<-- GroupN -->| > > (2) second block > ------------------------------------------------------------- > |A1| | | | |A2| | | | | | | | | | | | | | | > ------------------------------------------------------------- > |<-- Group1 -->|<-- Group2 -->|<-- ****** -->|<-- GroupN -->| > > (N) Nth block > ------------------------------------------------------------- > |A1| | | | |A2| | | | |A3| | | | |An| | | | | > ------------------------------------------------------------- > |<-- Group1 -->|<-- Group2 -->|<-- ****** -->|<-- GroupN -->| > > SECOND ITERATION [B phase]: > > ------------------------------------------------------------- > |A1|B1| | | |A2|B2| | | |A3|B3| | | |An|Bn| | | | > ------------------------------------------------------------- > |<-- Group1 -->|<-- Group2 -->|<-- ****** -->|<-- GroupN -->| > > Nth ITERATION [E phase]: > > ------------------------------------------------------------- > |A1|B1|C1|D1|E1|A2|B2|C2|D2|E2|A3|B3|C3|D3|E3|An|Bn|Cn|Dn|En| > ------------------------------------------------------------- > |<-- Group1 -->|<-- Group2 -->|<-- ****** -->|<-- GroupN -->| > > Finally, when modifiable area is completely filled then it is > possible to discard area's content and to begin filling iterations > again. We will have two modifiable areas are filling with > different frequencies and some state of replication of > information. Thereby, it provides basis for safe and independent > discarding of modifiable areas. > > The special filling policy has goal to provide a basis for > efficient search. Namely, first group contains blocks differ by > some period from each other. We have such sequence during saving: > [A1,A2,A3,..,An], [B1,B2,B3,..,Bn], ..., [E1,E2,E3,..,En]. But > first group will contain (A1,B1,C1,D1,E1). Thereby, passing > item-by-item through first group means jumping with some period. > Moreover, in the case of some failure it is possible to start the > searching from any group (with decreasing search efficiency). > It needs to take into account magic signature, header checksum and > timestamps during comparison of items in group. It provides opportunity > to distinguish valid blocks from empty and invalid ones and to > distinguish older blocks from latest ones. > > Searching in dedicated area gives opportunity to use read-ahead > technique. Moreover, if group contains many items then it is > possible to increase step between current and next items during > search. For example, it is possible to use such sequence of steps > during searching: 0, 1, 3, 5, 7, and so on. If we have found > latest item in first group, for example, then it is possible > to find a latest item in he whole sequence by means of jumping on > group period (count of blocks in a group). > > Two modifiable areas are filled with different frequencies and > it gives opportunity to use special searching algorithm. Such algorithm > can use, for example, secondary superblock area for rough, > preliminary search (because this modifiable area is changed rarely). > Then, further, algorithm can continue search in first superblock > area (because this modifiable area is changed more frequently). > Moreover, segctor thread has knowledge about all dirty files and it can > predict, theoretically, how many segments will be constructed. > Thereby, it is possible to save in items of modifiable area's groups > such prediction in the form of hint that it can be used during search > for improving search algorithm efficiency. > > With the best regards, > Vyacheslav Dubeyko. I hope I understand your approach correctly. Can it be summarized as follows: Instead of overwriting the super block you want to reserve the first segment to write the super block in a round-robin way into groups. Thereby spreading the writes over a larger area. Then the groups should probably have a typical erase block size like 512k. If that is true, I don't think you need any special algorithm to search the latest super block. You just read in the whole segment at mount time and select the one with the biggest s_last_cno. What about Ryusukes suggestion of never updating the super block and instead using a clever segment allocation scheme that allows a binary search for the latest segment? br, Andreas Rohner -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html