RE: Writes doubled by NILFS2

"Yongkun Wang" <yongkun@xxxxxxxxxxxxxxxxxxxxx> · Wed, 21 Apr 2010 23:41:13 +0900

hi, Ryusuke,

Thank you for the reply.

It's O_SYNC.

The dumpseg and lssu are useful tools. The description of the structure in
the slides is very clear. 
Thank you for the hint of checking the meta data.

The information by dumpseg:

# dumpseg 698 | grep ino
      ino = 12, cno = 9, nblocks = 1356, ndatblk = 1343
      ino = 6, cno = 9, nblocks = 1, ndatblk = 1
      ino = 4, cno = 0, nblocks = 1, ndatblk = 1
      ino = 5, cno = 0, nblocks = 2, ndatblk = 2
      ino = 3, cno = 0, nblocks = 681, ndatblk = 681

File with inode 12 (ino=12) is the data file. "nblocks" is the number of
occupied blocks, right?

In this segment, the number of data file blocks is 1356, and the number of
the rest of the blocks is 685 (1+1+2+681).
Total 685 "overhead" blocks in this segment?

This may explain the additional writes in our trace.

Best Regards,
Yongkun

-----Original Message-----
From: Ryusuke Konishi [mailto:ryusuke@xxxxxxxx] 
Sent: Tuesday, April 20, 2010 8:45 PM
To: yongkun@xxxxxxxxxxxxxxxxxxxxx
Cc: linux-nilfs@xxxxxxxxxxxxxxx
Subject: Re: Writes doubled by NILFS2

Hi,
On Tue, 20 Apr 2010 17:39:13 +0900, "Yongkun Wang"
<yongkun@xxxxxxxxxxxxxxxxxxxxx> wrote:
> Hey, guys,
> 
> We have a database system, the data is stored on the disk formatted with
> NILFS2 (nilfs-2.0.15, kmod-nilfs-2.0.5-1.2.6.18_92.1.22.el5.x86_64).
> 
> I have run a trace at the system call level and the block IO level, that
is,
> tracing the requests before processed by NILFS2 and after processed by
> NILFS2.
> 
> We use synchronous IO. So the amount of writes at the two trace points
> should be equal. 
> It is true when we use EXT2 file system.
> 
> However, for NILFS2, we found that the writes have been doubled, that is,
> the amount of writes is doubled after processed by NILFS2. The amount of
> writes at the system call level is equal between EXT2 and NILFS2. 

Interesting results.  What kind of synchronous write did you use in
the measurement ?  fsync? or O_SYNC writes ?

> Since all the address are log-structured, it is hard to know what are the
> additional writes.
>
> Can you provide some hints on the additional writes? Is it caused by some
> special functions such as snapshot?

You can look into the logs with dumpseg(8) command:

 # dumpseg <segment number>

This shows summary of blocks written in the specified segment. lssu(1)
command would be of help for finding a log head.

In the dump log, files with inode number 3,4,5,6 are metadata.  The
log format is depicted in the page 10 of the following slides:

  http://www.nilfs.org/papers/jls2009-nilfs.pdf

In general, copy-on-write filesystems including lfs are said to incur
overheads by metadata writes especially for synchronous writes.

I guess small-sized fsyncs or O_SYNC writes are causing the overhead.

Thanks,
Ryusuke

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html