On 2020-08-07 1:06 PM, Cezary Rojewski wrote:
Implement support for Lynxpoint and Wildcat Point AudioDSP. Catpt
solution deprecates existing sound/soc/intel/haswell which is removed in
the following series. This cover-letter is followed by 'Developer's deep
dive' message schedding light on catpt's key concepts and areas
addressed.
Developer's deep dive
=====================
Purpose of this message is explanation of Catpt's key concepts, problems
addressed when compared to /soc/intel/haswell/ (which from now on will
be addressed to as /haswell/) as well as answeing major why(s)
surrounding the subject. Message does not explain every detail of every
process as it's length is already high without doing so.
In case of any question and areas which have not beed mentioned here,
please don't hesitate to send an email.
Following removal of /haswell/ and moving forward
-------------------------------------------------
Catpt is a direct replacement for existing /haswell/ solution. Because
of userspace API being inherited as well as FW binary re-used, /haswell/
is going to be removed entirely in the follow-up series. In consequence
of that action, majority of processing code found in /soc/intel/common
becomes redundant. /common/ was supposed to host code common to Intel
AudioDSP architectures. Unfortunately, it never lived up to the
expecations. Most of code found there is LPT/WPT specific with some
/soc/intel/baytail/ deviates while /soc/intel/skylake/ moved most of
it's stuff away from /common/ - even DSP initialization.
To my knowledge, no device makes use of legacy /baytail/ solution any
longer with all the products enlisting either /soc/intel/atom/ or
/soc/sof/intel/. Following the sanitization of /common/ folder, the
logical next step is removal of /baytrail/.
LPT and catpt
-------------
I'm not aware of any released Haswell-based product with AudioDSP
capability, despite these being planned for some Ultrabooks initially.
Broadwell ADSP on the other hand, was released and is present on the marked.
Decision not to cut LPT's ACPI ID off from /catpt/device.c is
maintainance based. While a piece of most of the models found on the
market is in IGK or Bangalore validation teams hands, the quantity is
very limited (~1 per model). On top of that, production stuff is poor
ContinuousIntegration medium. Our CI, once attached to the platform,
takes basically entire control over it, in exchange allowing to perform
test cycles in rapid fashion. Tests areas include but are not limited to
D3/D0, S3, S4, S5, G3 (power off), individual streaming, concurrent
streaming, cpu overload (so on and so forth..). That comes in cost of
preparation - GPIOs and other pins need to be exposed and available
for our CI and external hw: power switches, hydras for dynamic external
codec connections and disconnections and more. Production stuff, for
obvious reasons does not expose such capabilities. This is different on
RVPs which are made for that very purpose and are very CI-friendly.
There are more of them available too.
Age of the LPT/WPT architecture has its toll, though. Not many working
RVP exemplars are available and even less CPUs - yes, you need a special
one to match the RVP, production CPU won't cut it. TLDR: there are not
enough working WPT RVPs when combined with pre-release CPUs to mark
catpt CI healthy for long-term validation (2-3 years+). To compensate,
holes have been filled with LPT equivalents.
Considering DSP hw capabilities are basically identical between LPT and
WPT with FW code being shared for both of them (one branch) this allows
for high test coverage of DSP functionality regardless of PCH present.
Codecs support is shared too - /soc/intel/boards/haswell.c aka
hsw-rt5640.c and /soc/intel/boards/broadwell.c aka bdw-rt286.c are PCH
agnostic.
Kernel code differences are minimal between the two and will probably be
reduced even more in the future. Given the reasons presented
above, I believe gains from healthy CI and test coverage heavily
outweight the maintainance cost of few lines of code appending LPT
device support.
Device and its components
-------------------------
Device probing has been redesigned to accommodate for:
single platform_device solution
coupling dw_dma dev with actual ADSP device
Starting point was two platform_devices as in
/soc/intel/common/sst-acpi.c. First gets created when specific acpi_id
is found within DSDT ACPI table and then callback is performed on
successful firmware file request which creates yet another device: this
time for PCM operations - haswell-pcm-audio. Having in mind Greg KH's
idea of reduction of number of platform_devices in Linux environment,
decision was made to cut the later off.
This raised a dependency issue as DW DMA Controller - one of LPT/WPT
AudioDsp device components - requires the device to be up and running
before being probed. Moreover, because of said controller making use of
pm_runtime during release, it is paramount pm_runtime is disabled before
invoking it. To address this, catpt's .probe() calls _dsp_power_up() -
which takes device out of D3 and allows for I/O access before proceeding
with DW DMA controller. .remove() starts with pm_runtime_disable()
so postmortem suspend and resumes are prevented.
DMAC plays a ADSP personal memcopier role, there are two engines
available 8 channels each. It's neither entirely owned by HOST nor DSP:
ownership is instead shared. As per spec, to prevent de-synchronization,
the following protocol is obeyed: HOST owns DW DMA controller as long as
FW isn't alive - that is, FW_READY notification has not been received.
Once DSP is unstalled and firmware boots, it is expected that HOST stops
all DMA operations and entire ownership of all 16 channels is taken by
FW. This effectively limits HOST's DMA usage to FW booting procedure.
Device begins its life in D3 state (11b for PCI PMCS::PS register), and
needs to be taken out of it via _dsp_power_up(). That has to be done
before FW image loading. Since LPT/WPT ADSP has no IMR - memory region
for storing firmware image - capability, context is lost on each D0
exit. In consequence image has to be reloaded on each resume. Here,
additional optimization has been added to prevent redundant image
flashing from occurring when module is about to be unloaded:
module_is_live check.
DMAC is not the only component tied to catpt. After allocating all the
necessary resources, probing DMAC and flashing FW, ASoC platform
component and card need to be created. The latter is triggered by
creation of child platform_device - as is the case for all
snd_soc_card(s). As a child device owned by catpt, it's catpt's
responsibility to remove it before solution gets unloaded. This is done
by the devm_add_action hook provided with platform_device_unregister
function.
Compared to /haswell/, catpt allows for core device probing regardless
of snd_soc_acpi_mach being present or not. This is similar to
codec-driver behaviour which probe right after matching device id on I2S
bus and is more logical than complete abort when no machines are present.
Allows for core device debug or tests (e.g. power sequences) in the
codec-less environment.
Resource management
-------------------
In contrast to its younger brothers and sisters from cAVS architecture
(SKL+), it's HOST's responsibility to manage SRAM - memory allocation
and power gating. SRAM is split into two banks: Data SRAM and
Instruction SRAM, subsequently divided into several EBBs with 32kBs
each. IRAM is targeted for fixed (static) data while DRAM holds both,
static and dynamic information.
Both, FW_BASE module and feature modules require persistent memory
allocated to them as well as some temporary one. The temporary block is
called scratch and is shared by all modules and thus only one gets
allocated compared to persistent ones which are module's individual
area. As FW context is lost on each D0 exit, to speed-up the boot
process HOST is expected to store dynamic information regions from DRAM.
Those regions contain module and stream states which subsequently allow
for bringing base FW as well as streams right back to where they were
before leaving D0.
Model seen in /soc/intel/common/sst-firmware.c has been redesigned. Two
simple structs have been enlisted:
catpt_mbank
catpt_mregion
to provide resource-like memory management. 'struct resource' serves
different purpose and its layout does not fit all needs of catpt and
that's why new types have been provided instead. catpt_mbank represents
SRAM bank of one type: IRAM -or- DRAM. It's made of _one_ or more
catpt_regions, never less.
catpt core device requests memory region for either fixed or dynamic
allocation by calling catpt_mbank_request_region or
catpt_mbank_reserve_region. Once allocated, catpt_mregion::busy field
gets flagged to ensure said region is no longer available until freed by
catpt_mbank_release_region.
In the very beginning each mbank is made of singular list of regions -
one element spanning entire SRAM with ::busy=0. On each allocation this
situation changes and more and more blocks are being extracted from the
free space. Banks maintain actual list of regions and perform a 'join'
procedure when a region gets yielded back to pool of free regions. Said
procedure attempts to join adjacent regions as long as they too are
::busy=0.
Presented mechanism allows for keeping lowest possible amount of EBBs
alive while power gating rest of them, saving maximum amount of power.
There are few exceptions, meaning regions which must always be power
un-gated. That goes for 0x200 (FW dump) at the front of DRAM and
everything past the highest module offset - that always goes for
FW_BASE. Everything else is available for dynamic allocation and should
be power gated when possible.
Last but not least is the LPCS - low power clock selection. While clock
selection is granular as per hw spec (6+ configurations), catpt deals
with it in binary fashion only. Clock is either set to low-power when
DSP is idle or high-power when streaming is done. This limitation is
inherited from equivalent Windows solution and in order to eliminate it,
much more testing has be to done. For now catpt sticks to what's stable.
Clock selection itself is guarded by in-progress register and may not be
performed until it's cleared. On top of that, as long as FW is alive,
HOST should await WAITI state before attempting any selection. This is
to ensure work on DSP side is not disrupted by unexpected clock change.
While in D3, HOST bypasses that rule and is free to select clock forcibly.
IPC protocol
------------
Catpt features simple, synchronized '1 message out - 1message in' FW
communication. This deviates from /soc/intel/common/sst-ipc.c as there
are no lists involved and there is no sst_plat_ipc_ops::reply_msg_match.
Vast majority of IPCs are one-shots meaning they flag DSP with busy
status and until response is received, no further messaging is allowed.
There is only one communication channel for request-response called
'downlink' with secondary channel called 'uplink' available for FW alone
to sent notifications to HOST. While most IPCs are one-shots, FW may
choose to delay the response. In such cases status PENDING is returned
back and HOST is expected to await actual replay coming from the
notification channel. Catpt verifies status of incoming response and
yields on success or failure but re-awaits the completion on said
PENDING status to ensure synchronization remains intact. Example of such
delayed reply is RESET_STREAM for low power offload pipe. Until response
is received, stream cannot progress in state machine, through operation
PAUSE and ultimately, RESUME.
Steps have been taken to reduce kmallocs/ kzallocs in IPC messages. This
is done by removal of temporary buffers in requests (/catpt/messages.c)
and instead working with provided ret-pointers directly - that is, only
when reply with SUCCESS status is returned back. Otherwise ret-pointer
is untouched. Moreover, tx buffer has been removed
(/catpt/ipc.c::catpt_ipc_arm) as once request is copied to hardware
registers, it is no longer of use for IPC framework. Said framework
gates the communication - field 'catpt_ipc::ready'. Once FW_READY
notification is received, mailbox is initialized and further messaging
is allowed. On critical failures: COREDUMP notification -or- IPC timeout
ready status is revoked. Things stay that way until DSP is recovered
from failure.
ASoC Platform component
-----------------------
Solution maintains backwards compatibility with previous one, /haswell/.
Kcontrols which were available there - master playback, capture and 2x
offload volumes - make they return in catpt. Differences:
new kcontrol 'Loopback Mute' has been added
volume controls now support quad rather than just stereo
The former is self-explanatory and has been missing in /haswell/.
Targets Loopback stream only. The later is an adjustment to align with
requirements and FW spec. Years ago, during LPT/WPT development change
request has been filed to increase the number of supported channels.
Looks like that information didn't get back to Linux, but in catpt this
has been addressed. As already noted, WAVES and module support in
general, is scheduled for later release.
Both volume and mute controls are stored within kcontrols::private_value
and are applied during pcm prepare() operation. As soc_mixer_control is
featured in every iteration of helpful macros present in
/include/sound/soc.h - which is stereo-configuration biased - decision
has been made to relocate control creation to component's .probe().
In regard to PCM operations alone, some invisible to userspace stuff has
been re-used from /haswell/ e.g.: volume_map and page_table arranging.
As internal FW pipes are governed with state machine, changes have been
made to ensure following in obeyed:
ALLOC -> RESET -> PAUSE -> RESUME
FREE <- RESET <- PAUSE <- RESUME
On existing solution, RESET could have been bypassed and stream moved to
PAUSE directly. In catpt, .trigger() and .prepare() functions do the
majority of stream's preparation and state changing ensuring these are
changed properly.
Another area of interest is substream's private data handling. This has
been modified from static block: struct hsw_priv_data::pcm and struct
hsw_priv_data::dmab to dynamic - on .startup() _dma_data is allocated
and assigned to given DAI. In catpt, said private data is the chest of
solution's PCM: struct catpt_stream_runtime. It stores current state,
template used and memory allocated. As LPT/WPT DSP does not offer
flexible topology, static one is applied. This is manifested via
catpt_topology global which describes the shape of every stream. There
are seven of them, 2 Bluetooth streams for SSP1 and 5 (system playback &
capture, two offloads and loopback) for SSP0.
This data is essential during stream allocation in .hw_params().
On D0 exit SSP device configuration is lost, just like other FW context.
SSP device formats are expected to be resent once FW resumes operations.
Catpt removes the need for /soc/intel/boards to play with IPCs
(sst_hsw_device_set_config) and assigns formats automatically on .pcm_new().
Heavy lifting has also been done for stream-position-update handled by
catpt_stream_update_position and POSITION_CHANGED notification. This
time, payload dumped by the later is always accounted for instead of
being ignored and combined with SET_WRITE_POS ipc, allows for stream
progression for OFFLOAD pins. On Dell XPS 13, all exposed DAIs apart
from the system one were not working correctly. For offload, HOST owns
write-pointer and is expected to send SET_WRITE_POS IPC periodically -
that can be done twice prior to stream's start aka RESUME and from
there, once on every POSITION_CHANGE notification. For loopback stream,
DAPM routes were missing what too has been addressed to ensure stream is
functional.
Thanks for bearing with me. In case of any questions, send me an email.
Kind Regards,
Czarek