Re: [PATCH v4 00/28] DCD: Add support for Dynamic Capacity Devices (DCD)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Oct 08, 2024 at 03:57:13PM -0700, Fan Ni wrote:
> On Mon, Oct 07, 2024 at 06:16:06PM -0500, Ira Weiny wrote:
> > A git tree of this series can be found here:
> > 
> > 	https://github.com/weiny2/linux-kernel/tree/dcd-v4-2024-10-04
> > 
> > Series info
> > ===========
> > 
> 
> Hi Ira,
> 
> Based on current DC extent release logic, when the extent to release is
> in use (for example, created a dax device), no response (4803h) will be sent.
> Should we send a response with empty extent list instead?
> 
> Fan

Oh. my bad. 4803h does not allow an empty extent list. 

Fan

> 
> 
> > This series has 5 parts:
> > 
> > Patch 1-3: Add %pra printk format for struct range
> > Patch 4: Add core range_overlaps() function
> > Patch 5-6: CXL clean up/prelim patches
> > Patch 7-26: Core DCD support
> > Patch 27-28: cxl_test support
> > 
> > Background
> > ==========
> > 
> > A Dynamic Capacity Device (DCD) (CXL 3.1 sec 9.13.3) is a CXL memory
> > device that allows memory capacity within a region to change
> > dynamically without the need for resetting the device, reconfiguring
> > HDM decoders, or reconfiguring software DAX regions.
> > 
> > One of the biggest use cases for Dynamic Capacity is to allow hosts to
> > share memory dynamically within a data center without increasing the
> > per-host attached memory.
> > 
> > The general flow for the addition or removal of memory is to have an
> > orchestrator coordinate the use of the memory.  Generally there are 5
> > actors in such a system, the Orchestrator, Fabric Manager, the Logical
> > device, the Host Kernel, and a Host User.
> > 
> > Typical work flows are shown below.
> > 
> > Orchestrator      FM         Device       Host Kernel    Host User
> > 
> >     |             |           |            |              |
> >     |-------------- Create region ----------------------->|
> >     |             |           |            |              |
> >     |             |           |            |<-- Create ---|
> >     |             |           |            |    Region    |
> >     |<------------- Signal done --------------------------|
> >     |             |           |            |              |
> >     |-- Add ----->|-- Add --->|--- Add --->|              |
> >     |  Capacity   |  Extent   |   Extent   |              |
> >     |             |           |            |              |
> >     |             |<- Accept -|<- Accept  -|              |
> >     |             |   Extent  |   Extent   |              |
> >     |             |           |            |<- Create --->|
> >     |             |           |            |   DAX dev    |-- Use memory
> >     |             |           |            |              |   |
> >     |             |           |            |              |   |
> >     |             |           |            |<- Release ---| <-+
> >     |             |           |            |   DAX dev    |
> >     |             |           |            |              |
> >     |<------------- Signal done --------------------------|
> >     |             |           |            |              |
> >     |-- Remove -->|- Release->|- Release ->|              |
> >     |  Capacity   |  Extent   |   Extent   |              |
> >     |             |           |            |              |
> >     |             |<- Release-|<- Release -|              |
> >     |             |   Extent  |   Extent   |              |
> >     |             |           |            |              |
> >     |-- Add ----->|-- Add --->|--- Add --->|              |
> >     |  Capacity   |  Extent   |   Extent   |              |
> >     |             |           |            |              |
> >     |             |<- Accept -|<- Accept  -|              |
> >     |             |   Extent  |   Extent   |              |
> >     |             |           |            |<- Create ----|
> >     |             |           |            |   DAX dev    |-- Use memory
> >     |             |           |            |              |   |
> >     |             |           |            |<- Release ---| <-+
> >     |             |           |            |   DAX dev    |
> >     |<------------- Signal done --------------------------|
> >     |             |           |            |              |
> >     |-- Remove -->|- Release->|- Release ->|              |
> >     |  Capacity   |  Extent   |   Extent   |              |
> >     |             |           |            |              |
> >     |             |<- Release-|<- Release -|              |
> >     |             |   Extent  |   Extent   |              |
> >     |             |           |            |              |
> >     |-- Add ----->|-- Add --->|--- Add --->|              |
> >     |  Capacity   |  Extent   |   Extent   |              |
> >     |             |           |            |<- Create ----|
> >     |             |           |            |   DAX dev    |-- Use memory
> >     |             |           |            |              |   |
> >     |-- Remove -->|- Release->|- Release ->|              |   |
> >     |  Capacity   |  Extent   |   Extent   |              |   |
> >     |             |           |            |              |   |
> >     |             |           |     (Release Ignored)     |   |
> >     |             |           |            |              |   |
> >     |             |           |            |<- Release ---| <-+
> >     |             |           |            |   DAX dev    |
> >     |<------------- Signal done --------------------------|
> >     |             |           |            |              |
> >     |             |- Release->|- Release ->|              |
> >     |             |  Extent   |   Extent   |              |
> >     |             |           |            |              |
> >     |             |<- Release-|<- Release -|              |
> >     |             |   Extent  |   Extent   |              |
> >     |             |           |            |<- Destroy ---|
> >     |             |           |            |   Region     |
> >     |             |           |            |              |
> > 
> > Implementation
> > ==============
> > 
> > The series still requires the creation of regions and DAX devices to be
> > closely synchronized with the Orchestrator and Fabric Manager.  The host
> > kernel will reject extents if a region is not yet created.  It also
> > ignores extent release if memory is in use (DAX device created).  These
> > synchronizations are not anticipated to be an issue with real
> > applications.
> > 
> > In order to allow for capacity to be added and removed a new concept of
> > a sparse DAX region is introduced.  A sparse DAX region may have 0 or
> > more bytes of available space.  The total space depends on the number
> > and size of the extents which have been added.
> > 
> > Initially it is anticipated that users of the memory will carefully
> > coordinate the surfacing of additional capacity with the creation of DAX
> > devices which use that capacity.  Therefore, the allocation of the
> > memory to DAX devices does not allow for specific associations between
> > DAX device and extent.  This keeps allocations very similar to existing
> > DAX region behavior.
> > 
> > To keep the DAX memory allocation aligned with the existing DAX devices
> > which do not have tags extents are not allowed to have tags.  Future
> > support for tags is planned.
> > 
> > Great care was taken to keep the extent tracking simple.  Some xarray's
> > needed to be added but extra software objects were kept to a minimum.
> > 
> > Region extents continue to be tracked as sub-devices of the DAX region.
> > This ensures that region destruction cleans up all extent allocations
> > properly.
> > 
> > Some review tags were kept if a patch did not change.
> > 
> > The major functionality of this series includes:
> > 
> > - Getting the dynamic capacity (DC) configuration information from cxl
> >   devices
> > 
> > - Configuring the DC partitions reported by hardware
> > 
> > - Enhancing the CXL and DAX regions for dynamic capacity support
> > 	a. Maintain a logical separation between hardware extents and
> > 	   software managed region extents.  This provides an
> > 	   abstraction between the layers and should allow for
> > 	   interleaving in the future
> > 
> > - Get hardware extent lists for endpoint decoders upon
> >   region creation.
> > 
> > - Adjust extent/region memory available on the following events.
> >         a. Add capacity Events
> > 	b. Release capacity events
> > 
> > - Host response for add capacity
> > 	a. do not accept the extent if:
> > 		If the region does not exist
> > 		or an error occurs realizing the extent
> > 	b. If the region does exist
> > 		realize a DAX region extent with 1:1 mapping (no
> > 		interleave yet)
> > 	c. Support the event more bit by processing a list of extents
> > 	   marked with the more bit together before setting up a
> > 	   response.
> > 
> > - Host response for remove capacity
> > 	a. If no DAX device references the extent; release the extent
> > 	b. If a reference does exist, ignore the request.
> > 	   (Require FM to issue release again.)
> > 
> > - Modify DAX device creation/resize to account for extents within a
> >   sparse DAX region
> > 
> > - Trace Dynamic Capacity events for debugging
> > 
> > - Add cxl-test infrastructure to allow for faster unit testing
> >   (See new ndctl branch for cxl-dcd.sh test[1])
> > 
> > - Only support 0 value extent tags
> > 
> > Fan Ni's upstream of Qemu DCD was used for testing.
> > 
> > Remaining work:
> > 
> > 	1) Allow mapping to specific extents (perhaps based on
> > 	   label/tag)
> > 	   1a) devise region size reporting based on tags
> > 	2) Interleave support
> > 
> > Possible additional work depending on requirements:
> > 
> > 	1) Accept a new extent which extends (but overlaps) an existing
> > 	   extent(s)
> > 	2) Release extents when DAX devices are released if a release
> > 	   was previously seen from the device
> > 	3) Rework DAX device interfaces, memfd has been explored a bit
> > 
> > [1] https://github.com/weiny2/ndctl/tree/dcd-region2-2024-10-01
> > 
> > ---
> > Major changes in v4:
> > - iweiny: rebase to 6.12-rc
> > - iweiny: Add qos data to regions
> > - Jonathan: Fix up shared region detection
> > - Jonathan/jgroves/djbw/iweiny: Ignore 0 value tags
> > - iweiny: Change DCD partition sysfs entries to allow for qos class and
> >   additional parameters per partition
> > - Petr/Andy: s/%par/%pra/
> > - Andy: Share logic between printing struct resource and struct range
> > - Link to v3: https://patch.msgid.link/20240816-dcd-type2-upstream-v3-0-7c9b96cba6d7@xxxxxxxxx
> > 
> > ---
> > Ira Weiny (14):
> >       test printk: Add very basic struct resource tests
> >       printk: Add print format (%pra) for struct range
> >       cxl/cdat: Use %pra for dpa range outputs
> >       range: Add range_overlaps()
> >       dax: Document dax dev range tuple
> >       cxl/pci: Delay event buffer allocation
> >       cxl/cdat: Gather DSMAS data for DCD regions
> >       cxl/region: Refactor common create region code
> >       cxl/events: Split event msgnum configuration from irq setup
> >       cxl/pci: Factor out interrupt policy check
> >       cxl/core: Return endpoint decoder information from region search
> >       dax/bus: Factor out dev dax resize logic
> >       tools/testing/cxl: Make event logs dynamic
> >       tools/testing/cxl: Add DC Regions to mock mem data
> > 
> > Navneet Singh (14):
> >       cxl/mbox: Flag support for Dynamic Capacity Devices (DCD)
> >       cxl/mem: Read dynamic capacity configuration from the device
> >       cxl/core: Separate region mode from decoder mode
> >       cxl/region: Add dynamic capacity decoder and region modes
> >       cxl/hdm: Add dynamic capacity size support to endpoint decoders
> >       cxl/mem: Expose DCD partition capabilities in sysfs
> >       cxl/port: Add endpoint decoder DC mode support to sysfs
> >       cxl/region: Add sparse DAX region support
> >       cxl/mem: Configure dynamic capacity interrupts
> >       cxl/extent: Process DCD events and realize region extents
> >       cxl/region/extent: Expose region extent information in sysfs
> >       dax/region: Create resources on sparse DAX regions
> >       cxl/region: Read existing extents on region creation
> >       cxl/mem: Trace Dynamic capacity Event Record
> > 
> >  Documentation/ABI/testing/sysfs-bus-cxl   | 120 +++-
> >  Documentation/core-api/printk-formats.rst |  13 +
> >  drivers/cxl/core/Makefile                 |   2 +-
> >  drivers/cxl/core/cdat.c                   |  52 +-
> >  drivers/cxl/core/core.h                   |  33 +-
> >  drivers/cxl/core/extent.c                 | 486 +++++++++++++++
> >  drivers/cxl/core/hdm.c                    | 213 ++++++-
> >  drivers/cxl/core/mbox.c                   | 605 ++++++++++++++++++-
> >  drivers/cxl/core/memdev.c                 | 130 +++-
> >  drivers/cxl/core/port.c                   |  13 +-
> >  drivers/cxl/core/region.c                 | 170 ++++--
> >  drivers/cxl/core/trace.h                  |  65 ++
> >  drivers/cxl/cxl.h                         | 122 +++-
> >  drivers/cxl/cxlmem.h                      | 131 +++-
> >  drivers/cxl/pci.c                         | 123 +++-
> >  drivers/dax/bus.c                         | 352 +++++++++--
> >  drivers/dax/bus.h                         |   4 +-
> >  drivers/dax/cxl.c                         |  72 ++-
> >  drivers/dax/dax-private.h                 |  47 +-
> >  drivers/dax/hmem/hmem.c                   |   2 +-
> >  drivers/dax/pmem.c                        |   2 +-
> >  fs/btrfs/ordered-data.c                   |  10 +-
> >  include/acpi/actbl1.h                     |   2 +
> >  include/cxl/event.h                       |  32 +
> >  include/linux/range.h                     |   7 +
> >  lib/test_printf.c                         |  70 +++
> >  lib/vsprintf.c                            |  55 +-
> >  tools/testing/cxl/Kbuild                  |   3 +-
> >  tools/testing/cxl/test/mem.c              | 960 ++++++++++++++++++++++++++----
> >  29 files changed, 3576 insertions(+), 320 deletions(-)
> > ---
> > base-commit: 9852d85ec9d492ebef56dc5f229416c925758edc
> > change-id: 20230604-dcd-type2-upstream-0cd15f6216fd
> > 
> > Best regards,
> > -- 
> > Ira Weiny <ira.weiny@xxxxxxxxx>
> > 
> 
> -- 
> Fan Ni

-- 
Fan Ni




[Index of Archives]     [Linux IBM ACPI]     [Linux Power Management]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]
  Powered by Linux