On Wed, 2024-04-10 at 21:32 +0200, Cyril Brulebois wrote: > Hi, > > Munin uses the following command to get sensor-type information out > of SMART-aware disks (e.g. temperature): > > /usr/sbin/smartctl -A --nocheck=standby -d ata /dev/sda > > This broke following an upgrade from v6.1.76 (as found in Debian 12) > to v6.1.82 (as currently found in the proposed-updates repository for > the next point release of Debian 12), with smartctl's now reporting: > > smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-19-amd64] > (local build) > Copyright (C) 2002-22, Bruce Allen, Christian Franke, > www.smartmontools.org > > Device is in SLEEP mode, exit(2) > > This happens on baremetal with 2 pairs of disks: > - 2×ST4000VN008-2DR1 (sda, sdb) > - 2×ST8000VN004-2M21 (sdc, sdd) > > and that's an obvious lie with one pair doing system stuff and the > other > one doing media stuff. > > This also happens within a Debian 12 QEMU VM running on a Debian 12 > libvirt host, when using a SATA disk, which is what I've used to test > various builds from the stable/linux-6.1.y branch and associated > tags. > > Building stable releases, I pinpointed it as a regression between > v6.1.80 and v6.1.81, then pinpointed it to commit cf33e6ca12d8. > > #regzbot introduced: v6.1.80..v6.1.81 > #regzbot introduced: cf33e6ca12d8 > > This is also affecting v6.1.84 and v6.1.85 (released during my git > bisect session). > > Reported in Debian via: https://bugs.debian.org/1068675 ;(which > included a trace with the distribution-provided v6.1.82 package). > > Most recent trace, with v6.1.85 (mainline, using the distribution's > config but without any patches): > > [ 30.547027] ------------[ cut here ]------------ > [ 30.547034] WARNING: CPU: 0 PID: 697 at > drivers/scsi/scsi_lib.c:214 scsi_execute_cmd+0x42/0x2c0 [scsi_mod] > [ 30.547082] Modules linked in: tls tun intel_rapl_msr > intel_rapl_common kvm_intel kvm irqbypass ghash_clmulni_intel > sha512_ssse3 sha512_generic sha256_ssse3 sha1_ssse3 > snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg > snd_intel_sdw_acpi aesni_intel snd_hda_codec crypto_simd cryptd rapl > snd_hda_core snd_hwdep bochs drm_vram_helper pcspkr drm_ttm_helper > snd_pcm iTCO_wdt snd_timer intel_pmc_bxt ttm iTCO_vendor_support snd > watchdog soundcore virtio_console virtio_balloon drm_kms_helper > button joydev evdev serio_raw sg binfmt_misc fuse loop drm efi_pstore > dm_mod configfs qemu_fw_cfg virtio_rng ip_tables x_tables autofs4 > ext4 crc32c_generic crc16 mbcache jbd2 hid_generic usbhid hid sd_mod > t10_pi crc64_rocksoft crc64 crc_t10dif crct10dif_generic ahci libahci > virtio_scsi virtio_blk virtio_net net_failover failover xhci_pci > crct10dif_pclmul crct10dif_common crc32_pclmul libata crc32c_intel > xhci_hcd psmouse i2c_i801 i2c_smbus scsi_mod scsi_common lpc_ich > virtio_pci > [ 30.547194] virtio_pci_legacy_dev virtio_pci_modern_dev > usbcore usb_common virtio virtio_ring > [ 30.547205] CPU: 0 PID: 697 Comm: smartctl Not tainted 6.1.85 > #1 > [ 30.547210] Hardware name: QEMU Standard PC (Q35 + ICH9, > 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 > [ 30.547217] RIP: 0010:scsi_execute_cmd+0x42/0x2c0 [scsi_mod] This is a different manifestation of the same bug in stable that was introduced by a backport of scsi_execute_cmd. The proposed fix for the domain validation problem here will also sort out this problem: https://lore.kernel.org/linux-scsi/yq1frvvpymp.fsf@xxxxxxxxxxxxxxxxxxxx/ James
Attachment:
signature.asc
Description: This is a digitally signed message part