[GSoC]Implement cache layer on top of userspace nvme driver. Inbox x

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,
I am a first-year student for my Master Degree at Nanjing University,
China. My research area focuses on distributed storage. Our team has
been worked with a project on Ceph for a half year. I am very
interested in your idea that implement cache layer on top of userspace
nvme driver. Here is my understand of your idea.

Ceph’s OSDS used to lay on filestore which uses the local POSIX file
system such XFS/BtrFS/EXT4. But it works poor for Ceph, so bluestore
was designed to replace it.One major difference from filestore is that
bluestore use rocksDB for managing metadata. RocksDB has ideal
key/value interface, it support transactions, uses ordered
enumeration, and can fast commits to log/journal. RocksDB also has
common interface, we can always swap in another KB DB if we want. In
bluestore rocksDB is used to save object metadata, write-ahead log,
ceph key/value omap data and allocator metadata.

At the bottom of the bluestore architecture is disk which can be HDD
or SSD. On top of the disks is the driver layer which include kernel
driver and userspace driver SPDK. With these drivers, the host
provides a block device layer which was bluestore built on.

SPDK is used to accelerate the speed of NVME SSD. NVME SSDs have very
high speed, its IOPS can reaches millions times per second. If we use
kernel driver, the system must switch between kernel and user mod
frequently, and the CPU cost of interrupt can be very high. The result
is that the kernel driver slow down the disk IOPS. SPDK runs in
userspace and uses polling instead of interrupt

Metadata cache is very critical for a better file system throughput,
because metadata operations occupy more than half of the I/O.
Currently bluestore has implemented kernel driver cache uses pagecache
to accelerate metadata, userspace NVME driver lacks any sort of memory
cache to help performance. So I apply this project to implement the
userspace NVME cache.

The implementation will uses the knowledge of SPDK,rocksDB and
pagecache. The goal is to cache metadata saved in rocksDB using API
provided by SPDK. Because rocksDB is a key-value storage that much
like database, so we could use the cache strategy implemented by
database such as SQLite. Fellow aspects are the most important we must
consider:

1)Consistency, which is related to the correctness of the data. We
could use a dirty list to mark the pages in cache that have been
modified, and make some operations to obtain the consistency.
2)hit rate, which is critical to the efficiency, high hit rate gives
high efficiency, hit rate is related to the policy of page
replacement.
3)valid rate, which is concerned with the memory space use. It is
somehow related to policy of page replacement too.
4)What to cache. I think the memTable of rocksDB could server as the
cache for a write operation. So the cache should focus on the metadata
querry operation.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux