GSoC Proposal - 2020 ==================== Project Name: Introducing Job control to the storage driver =========================================================== About Me: ========= Name: Prathamesh Chavan University: Indian Institute of Technology Kharagpur Major: Computer Science and Engineering(Dual Degree) Email: pc44800@xxxxxxxxx Blog: pratham-pc.github.io Contact: +91-993-235-8333 Time Zone: IST (UTC +5:30) Background: =========== I am a final year dual degree (B.Tech & M.Tech) student from the Department of Computer Science and Engineering at IIT Kharagpur. During my first year of college, I got introduced to open source through Kharagpur Open Source Society and later I became part of it. As my master's thesis project, I'm working on a tiered file system with Software Wear Management for NVM Technologies. I always wanted to get involved in storage technologies and the development process and Google Summer of Code is a great way to achieve it. I've been part of Google Summer of code in my second year of college under the Git Organization. It was my first experience with a large codebase. Information related to it is available in GSoC - 2017's archive[1]. Last year summers, I also interned at Nutanix. I worked on Logbay, which is a configuration-based data collection, archiving and streaming tool used by all services on Nutanix Controller-VM (CVM). I added multi-platform support to Logbay by identifying all the dependencies of Logbay on the other CVM based services and introduced an interface between these dependencies and Logbay-Core for allowing it to be used on different platforms where they aren’t available. I also implemented the interface on a Dev-VM and added multi-port support in it for allowing multiple instances of Logbay to run on a single Dev-VM to simulate a multi-node cluster which allowed the developers to test their changes on their Dev-VM itself. The Project: ============ Introducing job control to the storage driver Summary: Implement abstract job control and use it to improve storage driver. Mentor: Pavel Hrdina Abstract: ========= Currently, libvirt support job cancellation and progress reporting on domains. That is, if there's a long-running job on a domain, e.g. migration, libvirt reports how much data has already been transferred to the destination and how much still needs to be transferred. However, libvirt lacks such information reporting in storage area, to which libvirt developers refer to as the storage driver. The aim is to report progress on several storage tasks, like volume wiping, file allocation an others. Job Control in Domain: ====================== In src/qemu/qemu_domain.h, we can find the struct qemuDomainJobObj, which is a job object in domains, and is used for: coordinating between jobs, help identify which API call owns the job object, and contain rest additional info regarding the normal job/agent job/async job. This qemuDomainJobObj is part of another struct qemuDomainObjPrivate, which majorly is the object, the driver's API majorly interacts with which calling jobs on. Whenever an API call is made, depending upon the type of the job, specific locks are acquired and then the job is carried out. Exact design details regarding the implementation of such APIs are present in `src/qemu/THREADS.txt`. Job Control in Storage: ======================= Whenever an API call is made to a storageVol, the member `in_use` of the struct `virStorageVolDef` is used as a reference counter. This allows us to check whether the storage volume is already in use or not, and whether the current API call can be carried out. Once the API call exits, the reference counter is decremented. Reporting of job-progress as done in case of Domains: ===================================================== Additionally, when an async job is running: it also contains qemuDomainJobInfo: which stores the progress data of the async job, and an another qemuDomainJobInfo stores the statistics data of a recently completed job. Functions virDomainGetJobInfo() and virDomainGetJobStats() present in libvirt-domain.c help extract information about progress of a background job on a domain. Plan to implement something similar in Storage Jobs: ==================================================== Firstly, it's important to bring in the notion of jobs to the storage driver. Right now the API calls get directly executed if the required mutex locks are acquired. But this gives the rest of the API calls less information about what is running currently or has the locks acquired. Further, the domain jobs additionally contain a lot of information which can even be useful in case of the storage API calls. Firstly, identification of what all API calls are occurring on Storage Volumes in storage driver, and classifying them into something similar to normal jobs and async jobs (the long-running ones). Also, some of the API calls will not be acquiring a job (ones which didn't change the reference counter). After this, a document similar to src/qemu/THREADS.txt needs to be created for storage job handling and should mention the new design of the existing storage APIs, acquiring jobs and appropriate locks as required. Additional new APIs need to be implemented for the creation, deletion of storage jobs. These would be similar to the domain job API present in qemu/qemu_domain.h such as qemuDomainObjBeginJob(), etc. This specifically also included storage equivalent functions of virDomainGetJobInfo() and virDomainGetJobStats(). These would be used by the long-running storage jobs to report completion progress. Existing storage API needs to be implemented with this new job notion and the reference counter member 'in_use' be removed. Other desired changes: ====================== 1. Unification of the notion of jobs throughout: one of the steps taking keeping this in mind could be to have the storage job API implementation as close to the domain job API, so that later on unification would be easier. Unification, IMO, is left in the future scope of the project. 2. Prediction of time-remaining for job completion as the progress for various jobs is reported. BiteSizedTask attempted: ======================== Clean up variables in tools/libvirt-guests.sh.in Mentor: Martin Kletzander A patch was floated on the mailing list and its latest version can be found here[2]. Rest of the plans for the summers: ================================== Due to the on-going COVID-19 Pandemic, my internship was canceled. Hence, I'll be available full-time for the project. In August, I'll be joining Nutanix as a Full-time Employee. PS: === 1. It was already very late by the I decided to take part in GSoC'20. Hence, I wasn't able to give the required amount of time for preparing this project's proposal. If it would be okay, I'll still like to keep updating this proposal after the deadline and add a few important things, such as solid project deliverables along with a timeline. 2. As Pavel Hrdina is the mentor of this project, and as this project was suggested by Michal Privoznik, I've cc'd both of them to this email. 3. Since I still haven't spent enough time understanding the details of the existing APIs, I might have gone wrong at a few places, and I would be glad to have them pointed out. 4. One of the mentioned requirements according to libvirt's GSoC FAQ page[3] is passing an interview. When does this interview typically take place, in the GSoC timeline? 5. A google-doc of this proposal can be found here[4]. Comments on the doc are welcomed. [1]: https://summerofcode.withgoogle.com/archive/2017/projects/5434523185577984/ [2]: https://www.redhat.com/archives/libvir-list/2020-March/msg01303.html [3]: https://wiki.libvirt.org/page/Google_Summer_of_Code_FAQ [4]: https://docs.google.com/document/d/1Js-yi1oNrl9fRhzvMJwBHyygYCAtr_q7Bby8k1jOdig/edit?usp=sharing