I asked the same question in the list of Linux-btrfs, what I got is that the inode itself will be overwritten too, except for the superblocks. That seems make sense, how about NILFS2? I guess it should be the same. However, though I don't do much tests about SSD, I am wondering whether it will be a bottleneck, since even the data is written to a new place instead of writing it to the original one, the superblocks which are in a fixed place should be updated all the time. This operation might cause much more frequent over-writes to the areas where the super blocks are. Is the SSD smart enough to handle this over-write? or the file system can do something to avoid such kind of situation. Anyway, or this problem might not even be a problem for file system because there are cache and the writes can be delayed. I appreciate for your comments. Thanks, Yuehai On Tue, Sep 28, 2010 at 4:03 PM, Yuehai Xu <yuehaixu@xxxxxxxxx> wrote: > Hi, > > On Sun, Sep 26, 2010 at 5:17 AM, Ryusuke Konishi <ryusuke@xxxxxxxx> wrote: >> Hi, >> On Sat, 25 Sep 2010 18:54:04 -0400, Yuehai Xu wrote: >>> Hi, >>> >>> I have a SSD which is Intel x25-v, I use it to test the performance of >>> different file systems. The results seems confusing to me. It is from >>> postmark in which I set file number to 50,000 with each size from 9216 >>> bytes to 15360 bytes >>> >>> EXT3 NILFS2 BTRFS EXT4 XFS REISERFS EXT2 >>> PostMark(R) 146.67Kb/s 52.21Kb/s 90.59Kb/s >>> 172.12Kb/s 60.39Kb/s 146.67Kb/s 83.25Kb/s >>> PostMark(W) 28.09Mb/s 10Mb/s 17.35Mb/s 31.04Mb/s 11.56Mb/s 28.09Mb/s 15.94Mb/s >>> >>> From the results, the throughput both for R/W of NILFS2 is much >>> smaller than EXT3. As I noticed that the performance of NILFS2 for SSD >>> is much better than other file systems from the web site. What's the >>> matter with my own test result? My kernel is 2.6.32, and I use default >>> way to format NILFS2, nothing special is done. So, my questions are: >>> >>> 1. Are there any special parameters that I need to configure when I >>> mount/format my SSD? >> >> Nilfs in latest kernels have "discard" mount option, but I think >> it won't change the above result dramatically. >> >>> 2. Is the performance much different from different kind of SSD? >>> 3. As I know, NILFS2 is a kind of Log-file system, which means it >>> always makes the random writes sequentially and tries to avoid the >>> cost of overwrite for SSD. Does it need to overwrite its meta data? >>> Since I think NILFS2 should always make a mapping table, how about >>> when the mapping table need to be updated? >> >> For older SSD drives, it may be true. >> >> But, recent intelligent SSD drives are doing LFS like optimizations >> internally, so I guess they would show similar trend. > > What do you mean for "similar trend"? Do you mean that the throughput > should be almost the same no matter the different kinds of file > systems? > >> >> Unfortunately NILFS2 has performance issue due to its DAT and Garbage >> Collection design, and it seems to be hurting the merit of LFS for >> such high-throughput devices. >> >> DAT is metadata which is used to convert indirect block addresses >> (called virtual block number) to real block addresses. This makes >> relocation of disk blocks possible, but at the same time, it makes >> latency of disk access double against blocks read first time. >> >> To mitigate this design drawback, nilfs is applying readahead for such >> metadata, but it has much room for tuning. For instance, I recently >> sent some patches to improve read performance on intel SSD for 2.6.36. >> >> For details on the metadata organization and disk address conversion >> of NILFS2, please see the slides: >> >> http://www.nilfs.org/papers/jls2009-nilfs.pdf > > From the slice and Documentation/filesystems/nilfs2.txt, I notice that > the DAT is stored almost at the end of every LOG, which seems certain > parts of a disk. I know that Log-structure File System doesn't > overwrite the original files, however, what confuses me is whether the > mapping info in inode should be overwritten. For example, suppose > mapping info(LBA->PBA) of file A is stored in inode A. If file A is > over written, actually the new file is written to B, then, I think the > corresponding mapping info in inode A should changes from file A to > file B. In that way, is this operation a kind of over-write? Even > though I read the papers from the link you give me, I don't get the > exact answer. > > I appreciate your answer very much. > > For NILFS2, as you have said, DAT is used to convert virtual block > address to real block address, that is from logical block address(LBA) > to physical block address(PBA), I guess this is the place to maintain > the mapping info. However, when these info are updated, for example, > when over write happens, the original virtual block address should map > to a new real block address, in such case, will over write happen in > DAT? Is this one of the reason for SSD's performance of NILFS2? > > From the papers I have read, the general idea to do garbage collection > is to copy some "live" from segments and compact them into a new live > segments and then write them back, marking the original segments as > free. Do you have any ideas to optimize this gc? > > Since I only read some papers and don't hack the source code, the > questions I asked might be naive, I really appreciate your replying > > Thanks, > Yuehai > >> >> A link page ( http://www.nilfs.org/en/links.html ) has some more >> related links. >> >>> 4. What's special has done to optimize the performance of SSD? I guess >>> Log-file systems might have better performance than fast file systems >>> such as EXT2/3, but at least right now the results shows that I was >>> wrong. >> >> As I wrote above, recent SSD drives adopt own LFS like optimization, >> and the performance characteristics are greatly affected by it. It's >> well explained in the following article: >> >> http://lwn.net/Articles/353411/ >> >> >> Regards, >> Ryusuke Konishi >> > -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html