page-able RDMA

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hey all,

InfiniBand allows remote host to access the memory of a local process, without involvement of the local CPU. This is called "RDMA". Currently, this is implemented by the task registering the address-space region that will be accessible through the network using a special API call ( ibv_reg_mr ). This API pins the address space area into RAM space (using get_user_pages), makes it DMA mappable, and adds a device specific mapping for this region. The memory area is pinned in memory until the user chooses to remove the registration, through another API call (ibv_dereg_mr).

I am working on a prototype enabling page able memory for an InfiniBand driver using mmu_notifier. Such a task requires one to be able to manage a secondary PT for all relevant pages of a certain process,
This can be done using the mmu_notifier invalidation callback mechanism.

The pages will _NOT_ be pinned in RAM space, and all MMU actions will be reflected to the device's secondary PT, on the other hand the device will initiate page-fault events towards the driver when trying to operate on an unmapped page. the driver then will request mapping the relevant pages. Once the pages are in memory, the driver will update the device's secondary PT.

The work on the prototype has raised several fundamental questions:

Since the device needs to stop any ongoing operations regarding that page, one should make sure that the device is sync with the page going to be freed upon return from the invalidation callback, and halted any read/write to the page. this flushing action is somewhat expensive since it is blocked by HW possibly for a long (10s of milliseconds) time. * Are the invalidation callbacks sleep able (invalidate_page specifically)? thus allowing a scheduling HW sync?

Another goal to batch invalidations for performance improvement. Being able to delay a page invalidation can donate a major acceleration to our performance. So, One should be aware of when it is OK to delay invalidations. upon a swap based invalidation - it's probably OK to delay, but for a user unmap action - delaying the invalidation can lead to bad results. * Can one refuse an invalidation initiated on a page? what is the state of such a page? * What is your opinion about providing the notifiers with extra information regarding the invalidation cause (swap, unmap, page-migration etc...)? or splitting the notifier to "invalidation that we can postpone" and "invalidation that must happen now"?

I had some short private email exchange on the matter with Andrea, which now naturally is moved here, so to sync people on that correspondence I added this short intro. The original thread will be followed by this mail.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]