On Fri, Nov 30, 2012 at 3:09 PM, Artem Bityutskiy <dedekind1@xxxxxxxxx> wrote: > On Sat, 2012-11-17 at 01:04 +0530, srimugunthan dhandapani wrote: >> Hi all, >> >> Due to fundamental limits like size-per-chip and interface speed >> limits all large capacity Flash are made of multiple chips or banks. >> The presence of multiple chips readily offers parallel read or write support. >> Unlike an SSD, for a raw flash card , this parallelism is visible to >> the software layer and there are many opportunities >> for exploiting this parallelism. >> >> The presented LFTL is meant for flash cards with multiple banks and >> larger minimum write sizes. >> LFTL mostly reuses code from mtd_blkdevs.c and mtdblock.c. >> The LFTL was tested on a 512GB raw flash card which has no firmware >> for wearlevelling or garbage collection. >> >> The following are the important points regarding the LFTL: >> >> 1. multiqueued/multithreaded design:(Thanks to Joern engel for a >> mail-discussion) >> The mtd_blkdevs.c dequeues block I/O requests from the block layer >> provided request queue from a single kthread. >> This design of IO requests dequeued from a single queue by a single >> thread is a bottleneck for flash cards that supports hundreds of MB/sec. >> We use a multiqueued and multithreaded design. >> We bypass the block layer by registering a new make_request and >> the LFTL maintains several queues of its own and the block IO requests are >> put in one of these queues. For every queue there is an associated kthread >> that processes requests from that queue. The number of "FTL IO kthreads" >> is #defined as 64 currently. > > Hmm, should this be done in MTD layer, not hacked in in LFTL, so that > every MTD user could benefit? > > Long time ago Intel guys implemented "striping" in MTD, sent out, but it > did not make it to upstream. This is probably something your need. > > With striping support in MTD, you will end up with a 'virtual' MTD > device with larger eraseblock and minimum I/O unit. MTD would split all > the I/O requests and work with all the chips in parallel. > Thanks for replying. Current large capacity flash have several levels of parallelism chip-level, channel-level, package-level. 1. http://www.cse.ohio-state.edu/~fchen/paper/papers/hpca11.pdf 2. http://research.microsoft.com/pubs/63596/usenix-08-ssd.pdf Assuming only chip level parallelism and providing only striping feature may not exploit all the capabilities of flash hardware In the card that i worked, the hardware provides DMA read/write capability which automatically stripes the data across the chips.(hence the larger writesize = 32K) But it exposes the other levels of parallelism. LFTL does not stripe the data across the parallel I/O units(called "banks" in the code). But it dynamically selects one of the bank to write and one of the bank to garbage collect. Presently with respect to UBI+UBIFS, as block allocation is done by UBI and garbage collection by UBIFS, it is not possible to dynamically split the I/O read/writes and garbage collection read/writes across the banks. Although LFTL assumes only bank level parallelism and is currently not aware of hierarchy of parallel I/O units, i think it is possible to make LFTL aware of it in future. > This would be a big work, but everyone would benefit. > > -- > Best Regards, > Artem Bityutskiy -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html