On Fri, 08 Oct 2021 10:43:24 +0200, Takashi Iwai wrote: > > On Thu, 07 Oct 2021 18:51:58 +0200, > Rich Felker wrote: > > > > On Thu, Oct 07, 2021 at 06:18:52PM +0200, Takashi Iwai wrote: > > > On Thu, 07 Oct 2021 18:06:36 +0200, > > > Rich Felker wrote: > > > > > > > > On Thu, Oct 07, 2021 at 05:33:19PM +0200, Takashi Iwai wrote: > > > > > On Thu, 07 Oct 2021 15:11:00 +0200, > > > > > Arnd Bergmann wrote: > > > > > > > > > > > > On Thu, Oct 7, 2021 at 2:43 PM Takashi Iwai <tiwai@xxxxxxx> wrote: > > > > > > > On Thu, 07 Oct 2021 13:48:44 +0200, Arnd Bergmann wrote: > > > > > > > > On Thu, Oct 7, 2021 at 12:53 PM Takashi Iwai <tiwai@xxxxxxx> wrote: > > > > > > > > > On Wed, 06 Oct 2021 19:49:17 +0200, Michael Forney wrote: > > > > > > > > > > > > > > > > As far as I can tell, the broken interface will always result in > > > > > > > > user space seeing a zero value for "avail_min". Can you > > > > > > > > make a prediction what that would mean for actual > > > > > > > > applications? Will they have no audio output, run into > > > > > > > > a crash, or be able to use recover and appear to work normally > > > > > > > > here? > > > > > > > > > > > > > > No, fortunately it's only about control->avail_min, and fiddling this > > > > > > > value can't break severely (otherwise it'd be a security problem ;) > > > > > > > > > > > > > > In the buggy condition, it's always zero, and the kernel treated as if > > > > > > > 1, i.e. wake up as soon as data is available, which is OK-ish for most > > > > > > > applications. Apps usually don't care about the wake-up condition so > > > > > > > much. There are subtle difference and may influence on the stability > > > > > > > of stream processing, but the stability usually depends more strongly > > > > > > > on the hardware and software configurations. > > > > > > > > > > > > > > That being said, the impact by this bug (from the application behavior > > > > > > > POV) is likely quite small, but the contamination is large; as you > > > > > > > pointed out, it's much larger than I thought. > > > > > > > > > > > > Ok, got it. > > > > > > > > > > > > > The definition in uapi/sound/asound.h is a bit cryptic, but IIUC, > > > > > > > __snd_pcm_mmap_control64 is used for 64bit archs, right? If so, the > > > > > > > problem rather hits more widely on 64bit archs silently. Then, the > > > > > > > influence by this bug must be almost negligible, as we've had no bug > > > > > > > report about the behavior change. > > > > > > > > > > > > While __snd_pcm_mmap_control64 is only used on 32-bit > > > > > > architectures when 64-bit time_t is used. At the moment, this > > > > > > means all users of musl-1.2.x libc, but not glibc. > > > > > > > > > > > > On 64-bit architectures, __snd_pcm_mmap_control and > > > > > > __snd_pcm_mmap_control64 are meant to be identical, > > > > > > and this is actually true regardless of the bug, since > > > > > > __pad_before_uframe and __pad_after_uframe both > > > > > > end up as zero-length arrays here. > > > > > > > > > > > > > We may just fix it in kernel and for new library with hoping that no > > > > > > > one sees the actual problem. Or, we may provide a complete new set of > > > > > > > mmap offsets and ioctl to cover both broken and fixed interfaces... > > > > > > > The decision depends on how perfectly we'd like to address the bug. > > > > > > > As of now, I'm inclined to go for the former, but I'm open for more > > > > > > > opinions. > > > > > > > > > > > > Adding the musl list to Cc for additional testers, anyone interested > > > > > > please see [1] for the original report. > > > > > > > > > > > > It would be good to hear from musl users that are already using > > > > > > audio support with 32-bit applications on 64-bit kernels, which > > > > > > is the case that has the problem today. Have you noticed any > > > > > > problems with audio support here? If not, we can probably > > > > > > "fix" the kernel here and make the existing binaries behave > > > > > > the same way on 32-bit kernels. If there are applications that > > > > > > don't work in that environment today, I think we need to instead > > > > > > change the kernel to accept the currently broken format on > > > > > > both 32-bit and 64-bit kernels, possibly introducing yet another > > > > > > format that works as originally intended but requires a newly > > > > > > built kernel. > > > > > > > > > > Thanks! > > > > > > > > > > And now, looking more deeply, I feel more desperate. > > > > > > > > > > This bug makes the expected padding gone on little-endian. > > > > > On LE 32bit, the buggy definition is: > > > > > > > > > > char __pad1[0]; > > > > > u32 appl_ptr; > > > > > char __pad2[0]; // this should have been [4] > > > > > char __pad3[0]; > > > > > u32 avail_min; > > > > > char __pad4[4]; > > > > > > > > > > When an application issues SYNC_PTR64 ioctl to submit appl_ptr and > > > > > avail_min updates, 64bit kernel (in compat mode) reads directly as: > > > > > > > > > > u64 appl_ptr; > > > > > u64 avail_min; > > > > > > > > > > Hence a bogus appl_ptr would be passed if avail_min != 0. > > > > > And usually application sets non-zero avail_min. > > > > > That is, the bug must hit more severely if the new API were really > > > > > used. It wouldn't crash, but some weird streaming behavior can > > > > > happen like noise, jumping or underruns. > > > > > > > > > > (Reading back avail_min=0 to user-space is rather harmless. Ditto for > > > > > the case of BE, then at least there is no appl_ptr corruption.) > > > > > > > > > > This made me wonder which way to go: > > > > > it's certainly possible to fix the new kernel to treat both buggy and > > > > > sane formats (disabling compat mmap and re-define ioctls, having the > > > > > code for old APIs). The problem is, however, in the case where the > > > > > application needs to run on the older kernel that expects the buggy > > > > > format. Then apps would still have to send in the old buggy format -- > > > > > or maybe better in the older 32bit format that won't hit the bug > > > > > above. It makes situation more complicated. > > > > > > > > Can't an ioctl number just be redefined so that, on old kernels with > > > > the buggy one, newly built applications get told that mmap is not > > > > available and use the unaffected non-mmap fallback? > > > > > > The problem is that the SYNC_PTR64 ioctl itself for non-mmap fallback > > > is equally buggy due to this bug, too. So disabling mmap doesn't help > > > alone. > > > > > > And, yes, we can redefine ioctl numbers. But, then, application would > > > have to be bilingual, as well as the kernel; it'll have to switch back > > > to old API when running on older kernel, while the same binary would > > > need to run in a new API for a newer kernel. > > > > > > Maybe we can implement it in alsa-lib, if it really worth for it. > > > > In musl we already have ioctl struct conversion for running on > > time32-only kernels. So it may be practical to convert this too if > > needed. > > I guess we can work around without ioctl renumbering. The PCM API has > a protocol version handshaking, and user-space is supposed to tell its > API version to kernel. So the kernel can know in what version > user-space is talking with. > > Below is the PoC fix in the kernel side (totally untested). > The fix for alsa-lib will follow. And below is the PoC fix for alsa-lib. Takashi --- diff --git a/include/sound/uapi/asound.h b/include/sound/uapi/asound.h index 9fe3943f5fbb..ee91fe1f881f 100644 --- a/include/sound/uapi/asound.h +++ b/include/sound/uapi/asound.h @@ -154,7 +154,7 @@ struct snd_hwdep_dsp_image { * * *****************************************************************************/ -#define SNDRV_PCM_VERSION SNDRV_PROTOCOL_VERSION(2, 0, 15) +#define SNDRV_PCM_VERSION SNDRV_PROTOCOL_VERSION(2, 0, 16) typedef unsigned long snd_pcm_uframes_t; typedef signed long snd_pcm_sframes_t; @@ -541,6 +541,7 @@ struct __snd_pcm_sync_ptr { } s; union { struct __snd_pcm_mmap_control control; + struct __snd_pcm_mmap_control control_api_2_0_15; /* no bug in 32bit mode */ unsigned char reserved[64]; } c; }; @@ -548,11 +549,15 @@ struct __snd_pcm_sync_ptr { #if defined(__BYTE_ORDER) ? __BYTE_ORDER == __BIG_ENDIAN : defined(__BIG_ENDIAN) typedef char __pad_before_uframe[sizeof(__u64) - sizeof(snd_pcm_uframes_t)]; typedef char __pad_after_uframe[0]; +typedef char __pad_before_u32[4]; +typedef char __pad_after_u32[0]; #endif #if defined(__BYTE_ORDER) ? __BYTE_ORDER == __LITTLE_ENDIAN : defined(__LITTLE_ENDIAN) typedef char __pad_before_uframe[0]; typedef char __pad_after_uframe[sizeof(__u64) - sizeof(snd_pcm_uframes_t)]; +typedef char __pad_before_u32[0]; +typedef char __pad_after_u32[4]; #endif struct __snd_pcm_mmap_status64 { @@ -570,13 +575,23 @@ struct __snd_pcm_mmap_status64 { struct __snd_pcm_mmap_control64 { __pad_before_uframe __pad1; snd_pcm_uframes_t appl_ptr; /* RW: appl ptr (0...boundary-1) */ - __pad_before_uframe __pad2; + __pad_after_uframe __pad2; __pad_before_uframe __pad3; snd_pcm_uframes_t avail_min; /* RW: min available frames for wakeup */ __pad_after_uframe __pad4; }; +/* buggy mmap control definition for 2.0.15 PCM API on 32bit mode */ +struct __snd_pcm_mmap_control64_api_2_0_15 { + __pad_before_u32 __pad1; + __u32 appl_ptr; + __pad_before_u32 __pad2; /* SiC! here is the bug */ + __pad_before_u32 __pad3; + __u32 avail_min; + __pad_after_uframe __pad4; +}; + struct __snd_pcm_sync_ptr64 { __u32 flags; __u32 pad1; @@ -586,6 +601,7 @@ struct __snd_pcm_sync_ptr64 { } s; union { struct __snd_pcm_mmap_control64 control; + struct __snd_pcm_mmap_control64_api_2_0_15 control_api_2_0_15; unsigned char reserved[64]; } c; }; diff --git a/src/pcm/pcm_hw.c b/src/pcm/pcm_hw.c index b3f9d1579d29..bb97b7ecf5ca 100644 --- a/src/pcm/pcm_hw.c +++ b/src/pcm/pcm_hw.c @@ -94,6 +94,7 @@ typedef struct { volatile struct snd_pcm_mmap_status * mmap_status; struct snd_pcm_mmap_control *mmap_control; + snd_pcm_uframes_t *avail_min_p; bool mmap_status_fallbacked; bool mmap_control_fallbacked; struct snd_pcm_sync_ptr *sync_ptr; @@ -507,7 +508,7 @@ static int snd_pcm_hw_sw_params(snd_pcm_t *pcm, snd_pcm_sw_params_t * params) params->silence_threshold == pcm->silence_threshold && params->silence_size == pcm->silence_size && old_period_event == hw->period_event) { - hw->mmap_control->avail_min = params->avail_min; + *hw->avail_min_p = params->avail_min; err = issue_avail_min(hw); goto out; } @@ -540,7 +541,7 @@ static int snd_pcm_hw_sw_params(snd_pcm_t *pcm, snd_pcm_sw_params_t * params) } pcm->tstamp_type = params->tstamp_type; } - hw->mmap_control->avail_min = params->avail_min; + *hw->avail_min_p = params->avail_min; if (hw->period_event != old_period_event) { err = snd_pcm_hw_change_timer(pcm, old_period_event); if (err < 0) @@ -980,6 +981,14 @@ static bool map_control_data(snd_pcm_hw_t *hw, } hw->mmap_control = mmap_control; + hw->avail_min_p = &mmap_control->avail_min; +#ifdef __SND_STRUCT_TIME64 + if (hw->version == SNDRV_PROTOCOL_VERSION(2, 0, 15)) { + struct __snd_pcm_mmap_control64_api_2_0_15 *buggy_control = + (struct __snd_pcm_mmap_control64_api_2_0_15 *)mmap_control; + hw->avail_min_p = &buggy_control->avail_min; + } +#endif return fallbacked; } @@ -1015,7 +1024,7 @@ static int map_status_and_control_data(snd_pcm_t *pcm, bool force_fallback) if (!(pcm->mode & SND_PCM_APPEND)) { /* Initialize the data. */ hw->mmap_control->appl_ptr = 0; - hw->mmap_control->avail_min = 1; + *hw->avail_min_p = 1; } snd_pcm_set_hw_ptr(pcm, &hw->mmap_status->hw_ptr, hw->fd, SNDRV_PCM_MMAP_OFFSET_STATUS +