Re: [FEATURE][PATCH 0/2] reiser4: Auto-punching holes on commit

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 07/19/2015 11:42 PM, Edward Shishkin wrote:

                   Auto-punching holes on commit


Storing zeros on disk is a rather stupid business. Indeed, right before
writing data to disk we can convert zeros to holes (this is abstract
objects described in POSIX), and, hence, save a lot of disk space.

Compressing zeros before storing them on disk is even more stupid
business: checking for zeros is less expensive procedure than
compression transform, so in addition we can save a lot of CPU
resources.

I'll remind how reiser4 implements holes.
The unix file plugin represents them via extent pointers marked by
some special way. The situation with cryptcompress file plugin is more
simple: it represents holes as literal holes (that is, absence of any
items of specific keys). It means that we can simply check and remove
all items, which represent a logical chunk filled with zeros. This is
exactly what we do now at flush time right before commit.

The best time for such check is atom's flush, which is to complete all
delayed actions. Specifically, it calls a static machine ->convert_node()
for all dirty formatted nodes. This machine scans all items of a node
and calls ->convert() method of every such item.

We used this framework for transparent compression on commit
(specifically to replace old fragments that compose compressed file's
body with the new ones). Now we use it also to punch holes at logical
chunks filled with zeros. That is, instead of replacing old items, we
just remove them from tree. Think of hole punching like of one more
delayed action.

I have implemented hole punching only for cryptcompress plugin. It also
can be implemented for "classic" unix-file plugin, which doesn't compress
data. However, it will be more complicated because of more complicated
format of holes. Finally, I think that having such feature only for one
file plugin is enough.


                          Solved Problems:


When flushing modified dirty pages, the process should be able to find
in the tree a respective item group to be replaced with new data. So we
should handle possible races when one process checks/creates the items
and the flushing process deletes those items during hole punching
procedure. To avoid this situation we maintain a special "economical"
counter of checked-in modifications for every logical cluster in struct
jnode. If the counter is greater than 1, then we simply don't punch a
hole.


                   Mount option "dont_punch_holes"


Since hole punching is useful feature for both HDD and SSD, I enabled it
by default. To turn it off use the mount option "dont_punch_holes". The
changes are backward and forward compatible, so no new format is needed.


                     How it looks on practice:


# mkfs.reiser4 -f -y /dev/sdaX
# mount /dev/sdaX /mnt
# dd if=/dev/zero of=/mnt/foo bs=65536 count=1000
# umount /mnt

Now dump the tree:

# debugfs.reiser4 -t /dev/sdaX | less

As we can see (attachment 1) the file foo doesn't have body, only stat-data (on-disk inode): we removed its body at flush time, because it is composed
of zeros (see my remark above about holes). Let's now append non-zero
data to our file "foo":

# mount /dev/sdaX /mnt
# echo "This is not zeros" >> /mnt/foo
# umount /mnt
# debugfs.reiser4 -t /dev/sdaX | less

As we can see (attachment 2) the body of the file "foo" now consists of only one item of length 59, which has offset 0x3e80000 (=65536000). This is exactly the string "This is not zeros" supplemented with zeros up to page size (4096)
and compressed by LZO1 algorithm.


Sorry, I have attached a wrong file in the attachment 2. Should be the following:

#3 CTAIL (ctail40): [2a:4(FB):666f6f00000000:10001:3e80000] OFF=340, LEN=19, flags=0x0 shift=16

That is, body of file "foo" consists of only one item of length 19 (= length of the string "This is not zeros" plus one byte, where the size of logical cluster is stored).
Zeros at the end of in-memory file are not stored on disk !

Thanks,
Edward.
--
To unsubscribe from this list: send the line "unsubscribe reiserfs-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux File System Development]     [Linux BTRFS]     [Linux NFS]     [Linux Filesystems]     [Ext4 Filesystem]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Resources]

  Powered by Linux