RE: [Patch v5 0/3] Introduce a driver to support host accelerated access to Microsoft Azure Blob for Azure VM

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> Subject: Re: [Patch v5 0/3] Introduce a driver to support host accelerated access
> to Microsoft Azure Blob for Azure VM
> 
> On Fri, Oct 08, 2021 at 01:11:02PM +0200, Vitaly Kuznetsov wrote:
> > Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> writes:
> >
> > ...
> > >
> > > Not to mention the whole crazy idea of "let's implement our REST api
> > > that used to go over a network connection over an ioctl instead!"
> > > That's the main problem that you need to push back on here.
> > >
> > > What is forcing you to put all of this into the kernel in the first
> > > place?  What's wrong with the userspace network connection/protocol
> > > that you have today?
> > >
> > > Does this mean that we now have to implement all REST apis that
> > > people dream up as ioctl interfaces over a hyperv transport?  That
> > > would be insane.
> >
> > As far as I understand, the purpose of the driver is to replace a "slow"
> > network connection to API endpoint with a "fast" transport over Vmbus.
> 
> Given that the network connection is already over vmbus, how is this "slow"
> today?  I have yet to see any benchmark numbers anywhere :(

Hi Greg,

The problem statement and benchmark numbers are in this patch. Maybe it's getting lost because of the long discussion. I'm pasting them again in the email:

Azure Blob storage [1] is Microsoft's object storage solution for the cloud. Users or client applications can access objects in Blob storage via HTTP, from anywhere in the world. Objects in Blob storage are accessible via the Azure Storage REST API, Azure PowerShell, Azure CLI, or an Azure Storage client library. The Blob storage interface is not designed to be a POSIX compliant interface.

Problem: When a client accesses Blob storage via HTTP, it must go through the Blob storage boundary of Azure and get to the storage server through multiple servers. This is also true for an Azure VM.

Solution: For an Azure VM, the Blob storage access can be accelerated by having Azure host execute the Blob storage requests to the backend storage server directly.

This driver implements a VSC (Virtual Service Client) for accelerating Blob storage access for an Azure VM by communicating with a VSP (Virtual Service
Provider) on the Azure host. Instead of using HTTP to access the Blob storage, an Azure VM passes the Blob storage request to the VSP on the Azure host. The Azure host uses its native network to perform Blob storage requests to the backend server directly.

This driver doesn't implement Blob storage APIs. It acts as a fast channel to pass user-mode Blob storage requests to the Azure host. The user-mode program using this driver implements Blob storage APIs and packages the Blob storage request as structured data to VSC. The request data is modeled as three user provided buffers (request, response and data buffers), that are patterned on the HTTP model used by existing Azure Blob clients. The VSC passes those buffers to VSP for Blob storage requests.

The driver optimizes Blob storage access for an Azure VM in two ways:

1. The Blob storage requests are performed by the Azure host to the Azure Blob backend storage server directly.

2. It allows the Azure host to use transport technologies (e.g. RDMA) available to the Azure host but not available to the VM, to reach to Azure Blob backend servers.
 
Test results using this driver for an Azure VM:
100 Blob clients running on an Azure VM, each reading 100GB Block Blobs.
(10 TB total read data)
With REST API over HTTP: 94.4 mins
Using this driver: 72.5 mins
Performance (measured in throughput) gain: 30%.
 
[1] https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fazure%2Fstorage%2Fblobs%2Fstorage-blobs-introduction&amp;data=04%7C01%7Clongli%40microsoft.com%7C9b9af86ab70f4c0e147208d957deb1f7%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637637436286978046%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=a8xHaFIsGEvI0D5u5cPFdzXT3WDtKXnmwtTSQj9byMY%3D&amp;reserved=0

> 
> > So what if instead of implementing this new driver we just use Hyper-V
> > Vsock and move API endpoint to the host?
> 
> What is running on the host in the hypervisor that is supposed to be handling
> these requests?  Isn't that really on some other guest?

The requests are handled by Hyper-V via a dedicated Blob service on behalf of the VM. The Blob service is running in the Hyper-V root partition for all the VMs on this Hyper-V server. The request to the "Blob server" is sent by this service over native TCP or RDMA used by Azure backend.

Thanks,

Long

> 
> confused,
> 
> greg k-h




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux