Re: Performance about nilfs2 for SSD

Yuehai Xu <yuehaixu@xxxxxxxxx> · Wed, 29 Sep 2010 17:44:30 -0400

I asked the same question in the list of Linux-btrfs, what I got is
that the inode itself will be overwritten too, except for the
superblocks. That seems make sense, how about NILFS2? I guess it
should be the same.

However, though I don't do much tests about SSD, I am wondering
whether it will be a bottleneck, since even the data is written to a
new place instead of writing it to the original one, the superblocks
which are in a fixed place should be updated all the time. This
operation might cause much more frequent over-writes to the areas
where the super blocks are. Is the SSD smart enough to handle this
over-write? or the file system can do something to avoid such kind of
situation.

Anyway, or this problem might not even be a problem for file system
because there are cache and the writes can be delayed.

I appreciate for your comments.

Thanks,
Yuehai

On Tue, Sep 28, 2010 at 4:03 PM, Yuehai Xu <yuehaixu@xxxxxxxxx> wrote:
> Hi,
>
> On Sun, Sep 26, 2010 at 5:17 AM, Ryusuke Konishi <ryusuke@xxxxxxxx> wrote:
>> Hi,
>> On Sat, 25 Sep 2010 18:54:04 -0400, Yuehai Xu wrote:
>>> Hi,
>>>
>>> I have a SSD which is Intel x25-v, I use it to test the performance of
>>> different file systems. The results seems confusing to me. It is from
>>> postmark in which I set file number to 50,000 with each size from 9216
>>> bytes to 15360 bytes
>>>
>>>       EXT3    NILFS2  BTRFS   EXT4    XFS     REISERFS        EXT2
>>> PostMark(R)   146.67Kb/s  52.21Kb/s   90.59Kb/s
>>> 172.12Kb/s    60.39Kb/s       146.67Kb/s      83.25Kb/s
>>> PostMark(W)   28.09Mb/s       10Mb/s  17.35Mb/s       31.04Mb/s       11.56Mb/s       28.09Mb/s       15.94Mb/s
>>>
>>> From the results, the throughput both for R/W of NILFS2 is much
>>> smaller than EXT3. As I noticed that the performance of NILFS2 for SSD
>>> is much better than other file systems from the web site. What's the
>>> matter with my own test result? My kernel is 2.6.32, and I use default
>>> way to format NILFS2, nothing special is done. So, my questions are:
>>>
>>> 1. Are there any special parameters that I need to configure when I
>>> mount/format my SSD?
>>
>> Nilfs in latest kernels have "discard" mount option, but I think
>> it won't change the above result dramatically.
>>
>>> 2. Is the performance much different from different kind of SSD?
>>> 3. As I know, NILFS2 is a kind of Log-file system, which means it
>>> always makes the random writes sequentially and tries to avoid the
>>> cost of overwrite for SSD. Does it need to overwrite its meta data?
>>> Since I think NILFS2 should always make a mapping table, how about
>>> when the mapping table need to be updated?
>>
>> For older SSD drives, it may be true.
>>
>> But, recent intelligent SSD drives are doing LFS like optimizations
>> internally, so I guess they would show similar trend.
>
> What do you mean for "similar trend"? Do you mean that the throughput
> should be almost the same no matter the different kinds of file
> systems?
>
>>
>> Unfortunately NILFS2 has performance issue due to its DAT and Garbage
>> Collection design, and it seems to be hurting the merit of LFS for
>> such high-throughput devices.
>>
>> DAT is metadata which is used to convert indirect block addresses
>> (called virtual block number) to real block addresses.  This makes
>> relocation of disk blocks possible, but at the same time, it makes
>> latency of disk access double against blocks read first time.
>>
>> To mitigate this design drawback, nilfs is applying readahead for such
>> metadata, but it has much room for tuning.  For instance, I recently
>> sent some patches to improve read performance on intel SSD for 2.6.36.
>>
>> For details on the metadata organization and disk address conversion
>> of NILFS2, please see the slides:
>>
>>  http://www.nilfs.org/papers/jls2009-nilfs.pdf
>
> From the slice and Documentation/filesystems/nilfs2.txt, I notice that
> the DAT is stored almost at the end of every LOG, which seems certain
> parts of a disk. I know that Log-structure File System doesn't
> overwrite the original files, however, what confuses me is whether the
> mapping info in inode should be overwritten. For example, suppose
> mapping info(LBA->PBA) of file A is stored in inode A. If file A is
> over written, actually the new file is written to B, then, I think the
> corresponding mapping info in inode A should changes from file A to
> file B. In that way, is this operation a kind of over-write? Even
> though I read the papers from the link you give me, I don't get the
> exact answer.
>
> I appreciate your answer very much.
>
> For NILFS2, as you have said, DAT is used to convert virtual block
> address to real block address, that is from logical block address(LBA)
> to physical block address(PBA), I guess this is the place to maintain
> the mapping info. However, when these info are updated, for example,
> when over write happens, the original virtual block address should map
> to a new real block address, in such case, will over write happen in
> DAT? Is this one of the reason for SSD's performance of NILFS2?
>
> From the papers I have read, the general idea to do garbage collection
> is to copy some "live" from segments and compact them into a new live
> segments and then write them back, marking the original segments as
> free. Do you have any ideas to optimize this gc?
>
> Since I only read some papers and don't hack the source code, the
> questions I asked might be naive, I really appreciate your replying
>
> Thanks,
> Yuehai
>
>>
>> A link page ( http://www.nilfs.org/en/links.html ) has some more
>> related links.
>>
>>> 4. What's special has done to optimize the performance of SSD? I guess
>>> Log-file systems might have better performance than fast file systems
>>> such as EXT2/3, but at least right now the results shows that I was
>>> wrong.
>>
>> As I wrote above, recent SSD drives adopt own LFS like optimization,
>> and the performance characteristics are greatly affected by it.  It's
>> well explained in the following article:
>>
>>  http://lwn.net/Articles/353411/
>>
>>
>> Regards,
>> Ryusuke Konishi
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html