Google Summer of Code 2017 project proposal: implementation of RBD diff checksums using a rolling checksum algorithm

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

My name is Radoslav Georgiev and I am currently studying Computer Science at Sofia University. As a passionate software developer with a special appreciation of free and open source software, I am interested in taking part in the Google Summer of Code 2017 program. My main motivation to choose Ceph as my prospective mentoring organization is because I am interested in distributed and scalable storage in general, and the way Ceph does it in particular (especially its architecture: the RADOS concept and library, the RADOSGW, the Ceph block device, CephFS, and the various assorted APIs). In addition to that, I used to work as a systems programmer for an organization that employed Ceph for its data storage and scalability needs, writing software that interacts with Ceph's service interfaces.

When I went through your organization's list of ideas, I saw that there is a need for verification of diffs of RADOS block device images (i.e. RBD diff checksums) to be implemented. Since (as mentioned in the idea description) calculating the checksum of an entire image would be too slow and inefficient, I came up with the (actually pretty trivial) solution to use a rolling checksum algorithm.

As you might know, rolling hash algorithms save computation time and memory. They split the input into chunks, and calculate the hash of each chunk using a simple (i.e. fast) hash algorithm. What is important to note here is that the hash for the whole string is calculated incrementally, beginning from zero (or some known constant value), and for each chunk added the checksum of the string including that chunk is calculated using only the old checksum (of the string without the chunk) and the difference between the last chunk and the currently added one. (For more information on this topic, see <https://en.wikipedia.org/wiki/Rolling_hash>.)

Some useful rolling hash algorithms that could be implemented are the Fletcher's checksum algorithms (Fletcher-16/32/64 <https://en.wikipedia.org/wiki/Fletcher's_checksum>), or the Adler-32 algorithm (which, albeit slightly modified, is used in rsync). (Of course, these algorithms do not provide particularly strong or cryptographically secure hashing but, as far as I understood, it is not really necessary to provide that for the purpose given.)

I think my idea is worthy of consideration as a solution to the presented problem, and I look forward to its acceptance.

P. S. I sent you a mail because it was mentioned on Ceph's GSoC organization page that students willing to participate should do so to foster better communication with the mentors.

--
Regards,
  Radoslav

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux