[PATCH 1/6] incfs: Add first files of incrementalfs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Eugene Zemtsov <ezemtsov@xxxxxxxxxx>

- fs/incfs dir
- Kconfig (CONFIG_INCREMENTAL_FS)
- Makefile
- Module and file system initialization and clean up code
- New MAINTAINERS entry
- Add incrementalfs.h UAPI header
- Register ioctl range in ioctl-numbers.txt
- Documentation

Signed-off-by: Eugene Zemtsov <ezemtsov@xxxxxxxxxx>
---
 Documentation/filesystems/incrementalfs.rst | 452 ++++++++++++++++++++
 Documentation/ioctl/ioctl-number.txt        |   1 +
 MAINTAINERS                                 |   7 +
 fs/Kconfig                                  |   1 +
 fs/Makefile                                 |   1 +
 fs/incfs/Kconfig                            |  10 +
 fs/incfs/Makefile                           |   4 +
 fs/incfs/main.c                             |  85 ++++
 fs/incfs/vfs.c                              |  37 ++
 include/uapi/linux/incrementalfs.h          | 189 ++++++++
 10 files changed, 787 insertions(+)
 create mode 100644 Documentation/filesystems/incrementalfs.rst
 create mode 100644 fs/incfs/Kconfig
 create mode 100644 fs/incfs/Makefile
 create mode 100644 fs/incfs/main.c
 create mode 100644 fs/incfs/vfs.c
 create mode 100644 include/uapi/linux/incrementalfs.h

diff --git a/Documentation/filesystems/incrementalfs.rst b/Documentation/filesystems/incrementalfs.rst
new file mode 100644
index 000000000000..682e3dcb6b5a
--- /dev/null
+++ b/Documentation/filesystems/incrementalfs.rst
@@ -0,0 +1,452 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=======================
+Incremental File System
+=======================
+
+Overview
+========
+Incremental FS is special-purpose Linux virtual file system that allows
+execution of a program while its binary and resource files are still being
+lazily downloaded over the network, USB etc. It is focused on incremental
+delivery for a small number (under 100) of big files (more than 10 megabytes).
+Incremental FS doesn’t allow direct writes into files and, once loaded, file
+content never changes. Incremental FS doesn’t use a block device, instead it
+saves data into a backing file located on a regular file-system.
+
+But why?
+--------
+To allow running **big** Android apps before their binaries and resources are
+fully downloaded to an Android device. If an app reads something not loaded yet,
+it needs to wait for the data block to be fetched, but in most cases hot blocks
+can be loaded in advance.
+
+Workflow
+--------
+A userspace process, called a data loader, mounts an instance of incremental-fs
+giving it a file descriptor on an underlying file system (like ext4 or f2fs).
+Incremental-fs reads content (if any) of this backing file and interprets it as
+a file system image with files, directories and data blocks. At this point
+the data loader can declare new files to be shown by incremental-fs.
+
+A process is started from a binary located on incremental-fs.
+All reads are served directly from the backing file
+without roundtrips into userspace. If the process accesses a data block that was
+not originally present in the backing file, the read operation waits.
+
+Meanwhile the data loader can feed new data blocks to incremental-fs by calling
+write() on a special .cmd pseudo-file. The data loader can request information
+about pending reads by calling poll() and read() on the .cmd pseudo-file.
+This mechanism allows the data loader to serve most urgently needed data first.
+Once a data block is given to incremental-fs, it saves it to the backing file
+and unblocks all the reads waiting for this block.
+
+Eventually all data for all files is uploaded by the data loader, and saved by
+incremental-fs into the backing file. At that moment the data loader is not
+needed any longer. The backing file will play the role of a complete
+filesystem image for all future runs of the program.
+
+Non-goals
+---------
+* Allowing direct writes by the executing processes into files on incremental-fs
+* Allowing the data loader change file size or content after it was loaded.
+* Having more than a couple hundred files and directories.
+
+
+Features
+========
+
+Read-only, but not unchanging
+-----------------------------
+On the surface a mount directory of incremental-fs would look similar to
+a read-only instance of network file system: files and directories can be
+listed and read, but can’t be directly created or modified via creat() or
+write(). At the same time the data loader can make changes to a directory
+structure via external ioctl-s. i.e. link and unlink files and directories
+(if they empty). Data can't be changed this way, once a file block is loaded
+there is no way to change it.
+
+Filesystem image in a backing file
+----------------------------------
+Instead of using a block device, all data and metadata is stored in a
+backing file provided as a mount parameter. The backing file is located on
+an underlying file system (like ext4 or f2fs). Such approach is very similar
+to what might be achieved by using loopback device with a traditional file
+system, but it avoids extra set-up steps and indirections. It also allows
+incremental-fs image to dynamically grow as new files and data come without
+having to do any extra steps for resizing.
+
+If the backing file contains data at the moment when incremental-fs is mounted,
+content of the backing file is being interpreted as filesystem image.
+New files and data can still be added through the external interface,
+and they will be saved to the backing file.
+
+Data compression
+----------------
+Incremental-fs can store compressed data. In this case each 4KB data block is
+compressed separately. Data blocks can be provided to incremental-fs by
+the data loader in a compressed form. Incremental-fs uncompresses blocks
+each time a executing process reads it (modulo page cache). Compression also
+takes care of blocks composed of all zero bytes removing necessity to handle
+this case separately.
+
+Partially present files
+-----------------------
+Data in the files consists of 4KB blocks, each block can be present or absent.
+Unlike in sparse files, reading an absent block doesn’t return all zeros.
+It waits for the data block to be loaded via the ioctl interface
+(respecting a timeout). Once a data block is loaded it never disappears
+and can’t be changed or erased from a file. This ability to frictionlessly
+wait for temporary missing data is the main feature of incremental-fs.
+
+Hard links. Multiple names for the same file
+--------------------------------------------
+Like all traditional UNIX file systems, incremental-fs supports hard links,
+i.e. different file names in different directories can refer to the same file.
+As mentioned above new hard links can be created and removed via
+the ioctl interface, but actual data files are immutable, modulo partial
+data loading. Each directory can only have at most one name referencing it.
+
+Inspection of incremental-fs internal state
+-------------------------------------------
+poll() and read() on the .cmd pseudo-file allow data loaders to get a list of
+read operations stalled due to lack of a data block (pending reads).
+
+
+Application Programming Interface
+=================================
+
+Regular file system interface
+-----------------------------
+Executing process access files and directories via regular Linux file interface:
+open, read, close etc. All the intricacies of data loading a file representation
+are hidden from them.
+
+External .cmd file interface
+----------------------------
+When incremental-fs is mounted, a mount directory contains a pseudo-file
+called '.cmd'. The data loader will open this file and call read(), write(),
+poll() and ioctl() on it inspect and change state of incremental-fs.
+
+poll() and read() are used by the data loader to wait for pending reads to
+appear and obtain an array of ``struct incfs_pending_read_info``.
+
+write() is used by the data loader to feed new data blocks to incremental-fs.
+A data buffer given to write() is interpreted as an array of
+``struct incfs_new_data_block``. Structs in the array describe locations and
+properties of data blocks loaded with this write() call.
+
+``ioctl(INCFS_IOC_PROCESS_INSTRUCTION)`` is used to change structure of
+incremental-fs. It receives an pointer to ``struct incfs_instruction``
+where type field can have be one of the following values.
+
+**INCFS_INSTRUCTION_NEW_FILE**
+Creates an inode (a file or a directory) without a name.
+It assumes ``incfs_new_file_instruction.file`` is populated with details.
+
+**INCFS_INSTRUCTION_ADD_DIR_ENTRY**
+Creates a name (aka hardlink) for an inode in a directory.
+A directory can't have more than one hardlink pointing to it, but files can be
+linked from different directories.
+It assumes ``incfs_new_file_instruction.dir_entry`` is populated with details.
+
+**INCFS_INSTRUCTION_REMOVE_DIR_ENTRY**
+Remove a name (aka hardlink) for a file from a directory.
+Only empty directories can be unlinked.
+It assumes ``incfs_new_file_instruction.dir_entry`` is populated with details.
+
+For more details see in uapi/linux/incrementalfs.h and samples below.
+
+Supported mount options
+-----------------------
+See ``fs/incfs/options.c`` for more details.
+
+    * ``backing_fd=<unsigned int>``
+        Required. A file descriptor of a backing file opened by the process
+        calling mount(2). This descriptor can be closed after mount returns.
+
+    * ``read_timeout_msc=<unsigned int>``
+        Default: 1000. Timeout in milliseconds before a read operation fails
+        if no data found in the backing file or provided by the data loader.
+
+Sysfs files
+-----------
+``/sys/fs/incremental-fs/version`` - a current version of the filesystem.
+One ASCII encoded positive integer number with a new line at the end.
+
+
+Examples
+--------
+See ``sample_data_loader.c`` for a complete implementation of a data loader.
+
+Mount incremental-fs
+~~~~~~~~~~~~~~~~~~~~
+
+::
+
+    int mount_fs(char *mount_dir, char *backing_file, int timeout_msc)
+    {
+        static const char fs_name[] = INCFS_NAME;
+        char mount_options[512];
+        int backing_fd;
+        int result;
+
+        backing_fd = open(backing_file, O_RDWR);
+        if (backing_fd == -1) {
+            perror("Error in opening backing file");
+            return 1;
+        }
+
+        snprintf(mount_options, ARRAY_SIZE(mount_options),
+            "backing_fd=%u,read_timeout_msc=%u", backing_fd, timeout_msc);
+
+        result = mount(fs_name, mount_dir, fs_name, 0, mount_options);
+        if (result != 0)
+            perror("Error mounting fs.");
+        return result;
+    }
+
+Open .cmd file
+~~~~~~~~~~~~~~
+
+::
+
+    int open_commands_file(char *mount_dir)
+    {
+        char cmd_file[255];
+        int cmd_fd;
+
+        snprintf(cmd_file, ARRAY_SIZE(cmd_file), "%s/.cmd", mount_dir);
+        cmd_fd = open(cmd_file, O_RDWR);
+        if (cmd_fd < 0)
+            perror("Can't open commands file");
+        return cmd_fd;
+    }
+
+Add a file to the file system
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+::
+
+    int create_file(int cmd_fd, char *filename, int *ino_out, size_t size)
+    {
+        int ret = 0;
+        __u16 ino = 0;
+        struct incfs_instruction inst = {
+                .version = INCFS_HEADER_VER,
+                .type = INCFS_INSTRUCTION_NEW_FILE,
+                .file = {
+                    .size = size,
+                    .mode = S_IFREG | 0555,
+                }
+        };
+
+        ret = ioctl(cmd_fd, INCFS_IOC_PROCESS_INSTRUCTION, &inst);
+        if (ret)
+            return -errno;
+
+        ino = inst.file.ino_out;
+        inst = (struct incfs_instruction){
+                .version = INCFS_HEADER_VER,
+                .type = INCFS_INSTRUCTION_ADD_DIR_ENTRY,
+                .dir_entry = {
+                    .dir_ino = INCFS_ROOT_INODE,
+                    .child_ino = ino,
+                    .name = ptr_to_u64(filename),
+                    .name_len = strlen(filename)
+                }
+            };
+        ret = ioctl(cmd_fd, INCFS_IOC_PROCESS_INSTRUCTION, &inst);
+        if (ret)
+            return -errno;
+        *ino_out = ino;
+        return 0;
+    }
+
+Load data into a file
+~~~~~~~~~~~~~~~~~~~~~
+
+::
+
+    int cmd_fd = open_commands_file(path_to_mount_dir);
+    char *data = get_some_data();
+    struct incfs_new_data_block block;
+    int err;
+
+    block.file_ino = file_ino;
+    block.block_index = 0;
+    block.compression = COMPRESSION_NONE;
+    block.data = (__u64)data;
+    block.data_len = INCFS_DATA_FILE_BLOCK_SIZE;
+
+    err = write(cmd_fd, &block, sizeof(block));
+
+
+Get an array of pending reads
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+::
+
+    int poll_res = 0;
+    struct incfs_pending_read_info reads[10];
+    int cmd_fd = open_commands_file(path_to_mount_dir);
+    struct pollfd pollfd = {
+        .fd = cmd_fd,
+        .events = POLLIN
+    };
+
+    poll_res = poll(&pollfd, 1, timeout);
+    if (poll_res > 0 && (pollfd.revents | POLLIN)) {
+        ssize_t read_res = read(cmd_fd, reads, sizeof(reads));
+        if (read_res > 0)
+            printf("Waiting reads %ld\n", read_res / sizeof(reads[0]));
+    }
+
+
+
+Ondisk format
+=============
+
+General principles
+------------------
+* The backbone of the incremental-fs ondisk format is an append only linked
+  list of metadata blocks. Each metadata block contains an offset of the next
+  one. These blocks describe files and directories on the
+  file system. They also represent actions of adding and removing file names
+  (hard links).
+  Every time incremental-fs instance is mounted, it reads through this list
+  to recreate filesystem's state in memory. An offset of the first record in the
+  metadata list is stored in the superblock at the beginning of the backing
+  file.
+
+* Most of the backing file is taken by data areas and blockmaps.
+  Since data blocks can be compressed and have different sizes,
+  single per-file data area can't be pre-allocated. That's why blockmaps are
+  needed in order to find a location and size of each data block in
+  the backing file. Each time a file is created, a corresponding block map is
+  allocated to store future offsets of data blocks.
+
+  Whenever a data block is given by data loader to incremental-fs:
+    - A data area with the given block is appended to the end of
+      the backing file.
+    - A record in the blockmap for the given block index is updated to reflect
+      its location, size, and compression algorithm.
+
+Important format details
+------------------------
+Ondisk structures are defined in the ``format.h`` file. They are all packed
+and use little-endian order.
+A backing file must start with ``incfs_super_block`` with ``s_magic`` field
+equal to 0x5346434e49 "INCFS".
+
+Metadata records:
+
+* ``incfs_inode`` - metadata record to declare a file or a directory.
+                    ``incfs_inode.i_mode`` determents if it is a file
+                    or a directory.
+* ``incfs_blockmap_entry`` - metadata record that specifies size and location
+                            of a blockmap area for a given file. This area
+                            contains an array of ``incfs_blockmap_entry``-s.
+* ``incfs_dir_action`` - metadata record that specifies changes made to a
+                    to a directory structure, e.g. add or remove a hardlink.
+* ``incfs_md_header`` - header of a metadata record. It's always a part
+                    of other structures and served purpose of metadata
+                    bookkeeping.
+
+Other ondisk structures:
+
+* ``incfs_super_block`` - backing file header
+* ``incfs_blockmap_entry`` - a record in a blockmap area that describes size
+                        and location of a data block.
+* Data blocks dont have any particular structure, they are written to the backing
+  file in a raw form as they come from a data loader.
+
+
+Backing file layout
+-------------------
+::
+
+              +-------------------------------------------+
+              |            incfs_super_block              |]---+
+              +-------------------------------------------+    |
+              |                 metadata                  |<---+
+              |                incfs_inode                |]---+
+              +-------------------------------------------+    |
+                        .........................              |
+              +-------------------------------------------+    |   metadata
+     +------->|               blockmap area               |    |  list links
+     |        |          [incfs_blockmap_entry]           |    |
+     |        |          [incfs_blockmap_entry]           |    |
+     |        |          [incfs_blockmap_entry]           |    |
+     |    +--[|          [incfs_blockmap_entry]           |    |
+     |    |   |          [incfs_blockmap_entry]           |    |
+     |    |   |          [incfs_blockmap_entry]           |    |
+     |    |   +-------------------------------------------+    |
+     |    |             .........................              |
+     |    |   +-------------------------------------------+    |
+     |    |   |                 metadata                  |<---+
+     +----|--[|               incfs_blockmap              |]---+
+          |   +-------------------------------------------+    |
+          |             .........................              |
+          |   +-------------------------------------------+    |
+          +-->|                 data block                |    |
+              +-------------------------------------------+    |
+                        .........................              |
+              +-------------------------------------------+    |
+              |                 metadata                  |<---+
+              |             incfs_dir_action              |
+              +-------------------------------------------+
+
+Unreferenced files and absence of garbage collection
+----------------------------------------------------
+Described file format can produce files that don't have any names for them in
+any directories. Incremental-fs takes no steps to prevent such situations or
+reclaim space occupied by such files in the backing file. If garbage collection
+is needed it has to be implemented as a separate userspace tool.
+
+
+Design alternatives
+===================
+
+Why isn't incremental-fs implemented via FUSE?
+----------------------------------------------
+TLDR: FUSE-based filesystems add 20-80% of performance overhead for target
+scenarios, and increase power use on mobile beyond acceptable limit
+for widespread deployment. A custom kernel filesystem is the way to overcome
+these limitations.
+
+From the theoretical side of things, FUSE filesystem adds some overhead to
+each filesystem operation that’s not handled by OS page cache:
+
+    * When an IO request arrives to FUSE driver (D), it puts it into a queue
+      that runs on a separate kernel thread
+    * Then another separate user-mode handler process (H) has to run,
+      potentially after a context switch, to read the request from the queue.
+      Reading the request adds a kernel-user mode transition to the handling.
+    * (H) sends the IO request to kernel to handle it on some underlying storage
+      filesystem. This adds a user-kernel and kernel-user mode transition
+      pair to the handling.
+    * (H) then responds to the FUSE request via a write(2) call.
+      Writing the response is another user-kernel mode transition.
+    * (D) needs to read the response from (H) when its kernel thread runs
+      and forward it to the user
+
+Together, the scenario adds 2 extra user-kernel-user mode transition pairs,
+and potentially has up to 3 additional context switches for the FUSE kernel
+thread and the user-mode handler to start running for each IO request on the
+filesystem.
+This overhead can vary from unnoticeable to unmanageable, depending on the
+target scenario. But it will always burn extra power via CPU staying longer
+in non-idle state, handling context switches and mode transitions.
+One important goal for the new filesystem is to be able to handle each page
+read separately on demand, because we don't want to wait and download more data
+than absolutely necessary. Thus readahead would need to be disabled completely.
+This increases the number of separate IO requests and the FUSE related overhead
+by almost 32x (128KB readahead limit vs 4KB individual block operations)
+
+For more info see a 2017 USENIX research paper:
+To FUSE or Not to FUSE: Performance of User-Space File Systems
+Bharath Kumar Reddy Vangoor, Stony Brook University;
+Vasily Tarasov, IBM Research-Almaden;
+Erez Zadok, Stony Brook University
+https://www.usenix.org/system/files/conference/fast17/fast17-vangoor.pdf
diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt
index c9558146ac58..a5f8e0eaff91 100644
--- a/Documentation/ioctl/ioctl-number.txt
+++ b/Documentation/ioctl/ioctl-number.txt
@@ -227,6 +227,7 @@ Code  Seq#(hex)	Include File		Comments
 'f'	00-0F	fs/ocfs2/ocfs2_fs.h	conflict!
 'g'	00-0F	linux/usb/gadgetfs.h
 'g'	20-2F	linux/usb/g_printer.h
+'g'	30-3F	include/uapi/linux/incrementalfs.h
 'h'	00-7F				conflict! Charon filesystem
 					<mailto:zapman@xxxxxxxxxxxx>
 'h'	00-1F	linux/hpet.h		conflict!
diff --git a/MAINTAINERS b/MAINTAINERS
index 5c38f21aee78..c92ad89ee5e5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7630,6 +7630,13 @@ F:	Documentation/hwmon/ina2xx
 F:	drivers/hwmon/ina2xx.c
 F:	include/linux/platform_data/ina2xx.h

+INCREMENTAL FILESYSTEM
+M:	Eugene Zemtsov <ezemtsov@xxxxxxxxxx>
+S:	Supported
+F:	fs/incfs/
+F:	include/uapi/linux/incrementalfs.h
+F:	Documentation/filesystems/incrementalfs.rst
+
 INDUSTRY PACK SUBSYSTEM (IPACK)
 M:	Samuel Iglesias Gonsalvez <siglesias@xxxxxxxxxx>
 M:	Jens Taprogge <jens.taprogge@xxxxxxxxxxxx>
diff --git a/fs/Kconfig b/fs/Kconfig
index 3e6d3101f3ff..19f89c936209 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -119,6 +119,7 @@ source "fs/quota/Kconfig"
 source "fs/autofs/Kconfig"
 source "fs/fuse/Kconfig"
 source "fs/overlayfs/Kconfig"
+source "fs/incfs/Kconfig"

 menu "Caches"

diff --git a/fs/Makefile b/fs/Makefile
index 427fec226fae..08c6b827df1a 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -108,6 +108,7 @@ obj-$(CONFIG_AUTOFS_FS)		+= autofs/
 obj-$(CONFIG_ADFS_FS)		+= adfs/
 obj-$(CONFIG_FUSE_FS)		+= fuse/
 obj-$(CONFIG_OVERLAY_FS)	+= overlayfs/
+obj-$(CONFIG_INCREMENTAL_FS)	+= incfs/
 obj-$(CONFIG_ORANGEFS_FS)       += orangefs/
 obj-$(CONFIG_UDF_FS)		+= udf/
 obj-$(CONFIG_SUN_OPENPROMFS)	+= openpromfs/
diff --git a/fs/incfs/Kconfig b/fs/incfs/Kconfig
new file mode 100644
index 000000000000..a810131deed0
--- /dev/null
+++ b/fs/incfs/Kconfig
@@ -0,0 +1,10 @@
+config INCREMENTAL_FS
+	tristate "Incremental file system support"
+	depends on BLOCK && CRC32
+	help
+	  Incremental FS is a read-only virtual file system that facilitates execution
+	  of programs while their binaries are still being lazily downloaded over the
+	  network, USB or pigeon post.
+
+	  To compile this file system support as a module, choose M here: the
+	  module will be called incrementalfs.
\ No newline at end of file
diff --git a/fs/incfs/Makefile b/fs/incfs/Makefile
new file mode 100644
index 000000000000..7892196c634f
--- /dev/null
+++ b/fs/incfs/Makefile
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: GPL-2.0
+obj-$(CONFIG_INCREMENTAL_FS)	+= incrementalfs.o
+
+incrementalfs-y := main.o vfs.o
\ No newline at end of file
diff --git a/fs/incfs/main.c b/fs/incfs/main.c
new file mode 100644
index 000000000000..07e1952ede9e
--- /dev/null
+++ b/fs/incfs/main.c
@@ -0,0 +1,85 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2018 Google LLC
+ */
+#include <linux/fs.h>
+#include <linux/init.h>
+#include <linux/module.h>
+
+#include <uapi/linux/incrementalfs.h>
+
+#define INCFS_CORE_VERSION 1
+
+extern struct file_system_type incfs_fs_type;
+
+static struct kobject *sysfs_root;
+
+static ssize_t version_show(struct kobject *kobj,
+			    struct kobj_attribute *attr, char *buff)
+{
+	return snprintf(buff, PAGE_SIZE, "%d\n", INCFS_CORE_VERSION);
+}
+
+static struct kobj_attribute version_attr = __ATTR_RO(version);
+
+static struct attribute *attributes[] = {
+	&version_attr.attr,
+	NULL,
+};
+
+static const struct attribute_group attr_group = {
+	.attrs = attributes,
+};
+
+static int __init init_sysfs(void)
+{
+	int res = 0;
+
+	sysfs_root = kobject_create_and_add(INCFS_NAME, fs_kobj);
+	if (!sysfs_root)
+		return -ENOMEM;
+
+	res = sysfs_create_group(sysfs_root, &attr_group);
+	if (res) {
+		kobject_put(sysfs_root);
+		sysfs_root = NULL;
+	}
+	return res;
+}
+
+static void cleanup_sysfs(void)
+{
+	if (sysfs_root) {
+		sysfs_remove_group(sysfs_root, &attr_group);
+		kobject_put(sysfs_root);
+		sysfs_root = NULL;
+	}
+}
+
+static int __init init_incfs_module(void)
+{
+	int err = 0;
+
+	err = init_sysfs();
+	if (err)
+		return err;
+
+	err = register_filesystem(&incfs_fs_type);
+	if (err)
+		cleanup_sysfs();
+
+	return err;
+}
+
+static void __exit cleanup_incfs_module(void)
+{
+	cleanup_sysfs();
+	unregister_filesystem(&incfs_fs_type);
+}
+
+module_init(init_incfs_module);
+module_exit(cleanup_incfs_module);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Eugene Zemtsov <ezemtsov@xxxxxxxxxx>");
+MODULE_DESCRIPTION("Incremental File System");
diff --git a/fs/incfs/vfs.c b/fs/incfs/vfs.c
new file mode 100644
index 000000000000..2e71f0edf8a1
--- /dev/null
+++ b/fs/incfs/vfs.c
@@ -0,0 +1,37 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2018 Google LLC
+ */
+#include <linux/blkdev.h>
+#include <linux/fs.h>
+
+#include <uapi/linux/incrementalfs.h>
+
+static struct dentry *mount_fs(struct file_system_type *type, int flags,
+			       const char *dev_name, void *data);
+static void kill_sb(struct super_block *sb);
+
+struct file_system_type incfs_fs_type = {
+	.owner = THIS_MODULE,
+	.name = INCFS_NAME,
+	.mount = mount_fs,
+	.kill_sb = kill_sb,
+	.fs_flags = 0
+};
+
+static int fill_super_block(struct super_block *sb, void *data, int silent)
+{
+	return 0;
+}
+
+static struct dentry *mount_fs(struct file_system_type *type, int flags,
+			       const char *dev_name, void *data)
+{
+	return mount_nodev(type, flags, data, fill_super_block);
+}
+
+static void kill_sb(struct super_block *sb)
+{
+	generic_shutdown_super(sb);
+}
+
diff --git a/include/uapi/linux/incrementalfs.h b/include/uapi/linux/incrementalfs.h
new file mode 100644
index 000000000000..5bcf66ac852b
--- /dev/null
+++ b/include/uapi/linux/incrementalfs.h
@@ -0,0 +1,189 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Userspace interface for Incremental FS.
+ *
+ * Incremental FS is special-purpose Linux virtual file system that allows
+ * execution of a program while its binary and resource files are still being
+ * lazily downloaded over the network, USB etc.
+ *
+ * Copyright 2019 Google LLC
+ */
+#ifndef _UAPI_LINUX_INCREMENTALFS_H
+#define _UAPI_LINUX_INCREMENTALFS_H
+
+#include <linux/limits.h>
+#include <linux/ioctl.h>
+#include <linux/types.h>
+
+/* ===== constants ===== */
+#define INCFS_NAME "incremental-fs"
+#define INCFS_MAGIC_NUMBER (0x5346434e49ul)
+#define INCFS_DATA_FILE_BLOCK_SIZE 4096
+#define INCFS_HEADER_VER 1
+
+#define INCFS_MAX_FILES 1000
+#define INCFS_COMMAND_INODE 1
+#define INCFS_ROOT_INODE 2
+
+#define INCFS_IOCTL_BASE_CODE 'g'
+
+/* ===== ioctl requests on command file ===== */
+
+/* Make changes to the file system via incfs instructions. */
+#define INCFS_IOC_PROCESS_INSTRUCTION \
+	_IOWR(INCFS_IOCTL_BASE_CODE, 30, struct incfs_instruction)
+
+enum incfs_compression_alg { COMPRESSION_NONE = 0, COMPRESSION_LZ4 = 1 };
+
+/*
+ * Description of a pending read. A pending read - a read call by
+ * a userspace program for which the filesystem currently doesn't have data.
+ *
+ * This structs can be read from .cmd file to obtain a set of reads which
+ * are currently pending.
+ */
+struct incfs_pending_read_info {
+	/* Inode number of a file that is being read from. */
+	__aligned_u64 file_ino;
+
+	/* Index of a file block that is being read. */
+	__u32 block_index;
+
+	/* A serial number of this pending read. */
+	__u32 serial_number;
+};
+
+/*
+ * A struct to be written into a .cmd file to provide a data block for a file.
+ */
+struct incfs_new_data_block {
+	/* Inode number of a file this block belongs to. */
+	__aligned_u64 file_ino;
+
+	/* Index of a data block. */
+	__u32 block_index;
+
+	/* Length of data */
+	__u32 data_len;
+
+	/*
+	 * A pointer ot an actual data for the block.
+	 *
+	 * Equivalent to: __u8 *data;
+	 */
+	__aligned_u64 data;
+
+	/*
+	 * Compression algorithm used to compress the data block.
+	 * Values from enum incfs_compression_alg.
+	 */
+	__u32 compression;
+
+	__u32 reserved1;
+
+	__aligned_u64 reserved2;
+};
+
+enum incfs_instruction_type {
+	INCFS_INSTRUCTION_NOOP = 0,
+	INCFS_INSTRUCTION_NEW_FILE = 1,
+	INCFS_INSTRUCTION_ADD_DIR_ENTRY = 3,
+	INCFS_INSTRUCTION_REMOVE_DIR_ENTRY = 4,
+};
+
+/*
+ * Create a new file or directory.
+ * Corresponds to INCFS_INSTRUCTION_NEW_FILE
+ */
+struct incfs_new_file_instruction {
+	/*
+	 * [Out param. Populated by the kernel after ioctl.]
+	 * Inode number of a newly created file.
+	 */
+	__aligned_u64 ino_out;
+
+	/*
+	 * Total size of the new file. Ignored if S_ISDIR(mode).
+	 */
+	__aligned_u64 size;
+
+	/*
+	 * File mode. Permissions and dir flag.
+	 */
+	__u16 mode;
+
+	__u16 reserved1;
+
+	__u32 reserved2;
+
+	__aligned_u64 reserved3;
+
+	__aligned_u64 reserved4;
+
+	__aligned_u64 reserved5;
+
+	__aligned_u64 reserved6;
+
+	__aligned_u64 reserved7;
+};
+
+/*
+ * Create or remove a name (aka hardlink) for a file in a directory.
+ * Corresponds to
+ * INCFS_INSTRUCTION_ADD_DIR_ENTRY,
+ * INCFS_INSTRUCTION_REMOVE_DIR_ENTRY
+ */
+struct incfs_dir_entry_instruction {
+	/* Inode number of a directory to add/remove a file to/from. */
+	__aligned_u64 dir_ino;
+
+	/* File to add/remove. */
+	__aligned_u64 child_ino;
+
+	/* Length of name field */
+	__u32 name_len;
+
+	__u32 reserved1;
+
+	/*
+	 * A pointer to the name characters of a file to add/remove
+	 *
+	 * Equivalent to: char *name;
+	 */
+	__aligned_u64 name;
+
+	__aligned_u64 reserved2;
+
+	__aligned_u64 reserved3;
+
+	__aligned_u64 reserved4;
+
+	__aligned_u64 reserved5;
+};
+
+/*
+ * An Incremental FS instruction is the way for userspace
+ * to
+ *   - create files and directories
+ *   - show and hide files in the directory structure
+ */
+struct incfs_instruction {
+	/* Populate with INCFS_HEADER_VER */
+	__u32 version;
+
+	/*
+	 * Type - what this instruction actually does.
+	 * Values from enum incfs_instruction_type.
+	 */
+	__u32 type;
+
+	union {
+		struct incfs_new_file_instruction file;
+		struct incfs_dir_entry_instruction dir_entry;
+
+		/* Hard limit on the instruction body size in the future. */
+		__u8 reserved[64];
+	};
+};
+
+#endif /* _UAPI_LINUX_INCREMENTALFS_H */
--
2.21.0.593.g511ec345e18-goog





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux