https://bugzilla.redhat.com/show_bug.cgi?id=1814682 --- Comment #36 from Alaa Hleihel (Mellanox) <ahleihel@xxxxxxxxxx> --- Hi, I logged in to the system and found the following issues: ################################################################ 1. rshim service start fails: Apr 12 02:39:06 qualcomm-amberwing-rep2-01.khw4.lab.eng.bos.redhat.com rshim[4799]: Probing pcie-01:00.2 Apr 12 02:39:06 qualcomm-amberwing-rep2-01.khw4.lab.eng.bos.redhat.com rshim[4799]: create rshim pcie-01:00.2 Apr 12 02:39:06 qualcomm-amberwing-rep2-01.khw4.lab.eng.bos.redhat.com rshim[4799]: Failed to map RShim registers [root@qualcomm-amberwing-rep2-01 ~]# rshim -f modprobe: FATAL: Module cuse not found in directory /lib/modules/4.18.0-147.el8.aarch64 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Probing pcie-01:00.2 create rshim pcie-01:00.2 Failed to map RShim registers The reason that a required module is not installed on the system: [root@qualcomm-amberwing-rep2-01 ~]# modinfo cuse modinfo: ERROR: Module cuse not found. The fix is: # dnf install -y kernel-modules-extra Then the module will be available: [root@qualcomm-amberwing-rep2-01 ~]# modinfo cuse filename: /lib/modules/4.18.0-147.el8.aarch64/kernel/fs/fuse/cuse.ko.xz ################################################################ 2. rshim service stop fails: Apr 12 02:35:57 qualcomm-amberwing-rep2-01.khw4.lab.eng.bos.redhat.com systemd[1]: Stopping rshim driver for BlueField SoC... Apr 12 02:35:57 qualcomm-amberwing-rep2-01.khw4.lab.eng.bos.redhat.com systemd[4383]: rshim.service: Failed to execute command: No such file or directory Apr 12 02:35:57 qualcomm-amberwing-rep2-01.khw4.lab.eng.bos.redhat.com systemd[4383]: rshim.service: Failed at step EXEC spawning /usr/bin/killall: No such file or directory ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Apr 12 02:35:57 qualcomm-amberwing-rep2-01.khw4.lab.eng.bos.redhat.com systemd[1]: rshim.service: Control process exited, code=exited status=203 Apr 12 02:36:55 qualcomm-amberwing-rep2-01.khw4.lab.eng.bos.redhat.com sshd[4384]: Connection closed by 10.35.206.44 port 60160 [preauth] Apr 12 02:36:59 qualcomm-amberwing-rep2-01.khw4.lab.eng.bos.redhat.com sshd[4386]: Accepted password for root from 10.35.206.44 port 60162 ssh2 Apr 12 02:36:59 qualcomm-amberwing-rep2-01.khw4.lab.eng.bos.redhat.com systemd-logind[1469]: New session 5 of user root. Apr 12 02:36:59 qualcomm-amberwing-rep2-01.khw4.lab.eng.bos.redhat.com systemd[1]: Started Session 5 of user root. Apr 12 02:36:59 qualcomm-amberwing-rep2-01.khw4.lab.eng.bos.redhat.com sshd[4386]: pam_unix(sshd:session): session opened for user root by (uid=0) Apr 12 02:37:27 qualcomm-amberwing-rep2-01.khw4.lab.eng.bos.redhat.com systemd[1]: rshim.service: State 'stop-sigterm' timed out. Killing. Apr 12 02:37:27 qualcomm-amberwing-rep2-01.khw4.lab.eng.bos.redhat.com systemd[1]: rshim.service: Killing process 4363 (rshim) with signal SIGKILL. Apr 12 02:37:27 qualcomm-amberwing-rep2-01.khw4.lab.eng.bos.redhat.com systemd[1]: rshim.service: Failed with result 'exit-code'. Apr 12 02:37:27 qualcomm-amberwing-rep2-01.khw4.lab.eng.bos.redhat.com systemd[1]: Stopped rshim driver for BlueField SoC. The fix is: # dnf install -y psmisc ################################################################ 3. Even after fixing the above, we still fail to load everything: [root@qualcomm-amberwing-rep2-01 ~]# rshim -f Probing pcie-01:00.2 create rshim pcie-01:00.2 Failed to map RShim registers >From strace on "rshim -f": write(1, "Probing pcie-01:00.2\n", 21Probing pcie-01:00.2 ) = 21 write(1, "create rshim pcie-01:00.2\n", 26create rshim pcie-01:00.2 ) = 26 openat(AT_FDCWD, "/dev/mem", O_RDWR|O_SYNC) = -1 ENOENT (No such file or directory) ^^^^^^^^^^ ^^^^^^^^^^^^^^ mmap(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_SHARED, -1, 0x80100300000) = -1 EBADF (Bad file descriptor) write(1, "Failed to map RShim registers\n", 30Failed to map RShim registers ) = 30 That's because CONFIG_DEVMEM is not enabled in the kernel: [root@qualcomm-amberwing-rep2-01 ~]# grep CONFIG_DEVMEM /boot/config-4.18.0-147.el8.aarch64 # CONFIG_DEVMEM is not set --> Note; I see that this config is disabled only on aarch64 in RHEL-8. I created a kernel with this config enabled, and then it worked. [root@qualcomm-amberwing-rep2-01 ~]# ls -l /dev/mem crw-r-----. 1 root kmem 1, 1 Apr 12 2020 /dev/mem [root@qualcomm-amberwing-rep2-01 ~]# systemctl start rshim [root@qualcomm-amberwing-rep2-01 ~]# systemctl status rshim ● rshim.service - rshim driver for BlueField SoC Loaded: loaded (/usr/lib/systemd/system/rshim.service; disabled; vendor preset: disabled) Active: active (running) since Sun 2020-04-12 05:36:57 EDT; 4s ago Docs: man:rshim(8) Process: 5783 ExecStart=/usr/sbin/rshim $OPTIONS (code=exited, status=0/SUCCESS) Main PID: 5784 (rshim) Tasks: 6 (limit: 37682) Memory: 32.5M CGroup: /system.slice/rshim.service └─5784 /usr/sbin/rshim Apr 12 05:36:57 qualcomm-amberwing-rep2-01.khw4.lab.eng.bos.redhat.com systemd[1]: Starting rshim driver for BlueField SoC... Apr 12 05:36:57 qualcomm-amberwing-rep2-01.khw4.lab.eng.bos.redhat.com systemd[1]: Started rshim driver for BlueField SoC. Apr 12 05:36:57 qualcomm-amberwing-rep2-01.khw4.lab.eng.bos.redhat.com rshim[5784]: Probing pcie-01:00.2 Apr 12 05:36:57 qualcomm-amberwing-rep2-01.khw4.lab.eng.bos.redhat.com rshim[5784]: create rshim pcie-01:00.2 Apr 12 05:36:58 qualcomm-amberwing-rep2-01.khw4.lab.eng.bos.redhat.com rshim[5784]: rshim0 attached [root@qualcomm-amberwing-rep2-01 ~]# ls -l /dev/rshim* total 0 crw-------. 1 root root 241, 0 Apr 12 05:36 boot crw-------. 1 root root 240, 0 Apr 12 05:36 console crw-------. 1 root root 239, 0 Apr 12 05:36 misc crw-------. 1 root root 242, 0 Apr 12 05:36 rshim [root@qualcomm-amberwing-rep2-01 ~]# ################################################################ 4. Even after fixing all previous issues, accessing the device always hangs. E.g either of these will hang: # cat /dev/rshim0/misc # sudo minicom --color on --baudrate 115200 --device /dev/rshim0/console And dmesg will show something like this: Apr 12 05:46:41 qualcomm-amberwing-rep2-01 kernel: INFO: task cat:6591 blocked for more than 60 seconds. Apr 12 05:46:41 qualcomm-amberwing-rep2-01 kernel: Not tainted 4.18.0 #1 Apr 12 05:46:41 qualcomm-amberwing-rep2-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 12 05:46:41 qualcomm-amberwing-rep2-01 kernel: cat D 0 6591 6316 0x00000201 Apr 12 05:46:41 qualcomm-amberwing-rep2-01 kernel: Call trace: Apr 12 05:46:41 qualcomm-amberwing-rep2-01 kernel: __switch_to+0x6c/0x90 Apr 12 05:46:41 qualcomm-amberwing-rep2-01 kernel: __schedule+0x270/0x8a8 Apr 12 05:46:41 qualcomm-amberwing-rep2-01 kernel: schedule+0x30/0x78 Apr 12 05:46:41 qualcomm-amberwing-rep2-01 kernel: request_wait_answer+0x144/0x260 [fuse] Apr 12 05:46:41 qualcomm-amberwing-rep2-01 kernel: __fuse_request_send+0xac/0xd0 [fuse] Apr 12 05:46:41 qualcomm-amberwing-rep2-01 kernel: fuse_request_send+0x58/0x68 [fuse] Apr 12 05:46:41 qualcomm-amberwing-rep2-01 kernel: fuse_direct_io+0x358/0x5a0 [fuse] Apr 12 05:46:41 qualcomm-amberwing-rep2-01 kernel: cuse_read_iter+0x78/0xa0 [cuse] Apr 12 05:46:41 qualcomm-amberwing-rep2-01 kernel: new_sync_read+0x108/0x158 Apr 12 05:46:41 qualcomm-amberwing-rep2-01 kernel: __vfs_read+0x74/0x90 Apr 12 05:46:41 qualcomm-amberwing-rep2-01 kernel: vfs_read+0x98/0x150 Apr 12 05:46:41 qualcomm-amberwing-rep2-01 kernel: ksys_read+0x6c/0xd0 Apr 12 05:46:41 qualcomm-amberwing-rep2-01 kernel: __arm64_sys_read+0x24/0x30 Apr 12 05:46:41 qualcomm-amberwing-rep2-01 kernel: el0_svc_handler+0xa0/0x128 Apr 12 05:46:41 qualcomm-amberwing-rep2-01 kernel: el0_svc+0x8/0xc Current BlueField version used was: Mellanox BlueField A0 BL1 V1.0 NOTICE: BL2: v1.5(release):BL2.2 NOTICE: BL2: Built : 15:58:07, Jul 25 2019 NOTICE: BL2 built for hw (ver 0) NOTICE: Running as MBF1M332A-AS system NOTICE: Initializing DDR at mss[0]=0x18000000 NOTICE: No SPD detected on MSS0 DIMM0 NOTICE: No SPD detected on MSS0 DIMM1 NOTICE: No memory present on MSS 0 NOTICE: Doing MSS idle operations on MSS 0 NOTICE: Initializing DDR at mss[1]=0x20000000 NOTICE: No SPD detected on MSS1 DIMM0 NOTICE: No SPD detected on MSS1 DIMM1 NOTICE: Finished initializing DDR on MSS 1! NOTICE: DDR POST passed. NOTICE: BL1: Booting BL31 NOTICE: BL31: v1.5(release):BL2.2 NOTICE: BL31: Built : 15:58:07, Jul 25 2019 NOTICE: BL31 built for hw (ver 0) UEFI firmware (version BlueField:2.0-f399628 built at 15:59:48 on Jul 25 2019) I've updated it to the latest BlueField-2.5.1.11213 (using kernel module rshim version rshim-1.18-0.gb99e894_4.18.0.aarch64 from BlueField-2.5.1.11213 bundle): Mellanox BlueField A0 BL1 V1.0 NOTICE: Enabled watchdog (120 sec delay) NOTICE: Next boot will be in swap_emmc mode NOTICE: BL2: v1.5(release):2.5.1-0-gbe0dd6b NOTICE: BL2: Built : 23:42:29, Apr 2 2020 NOTICE: BL2 built for hw (ver 0) NOTICE: Running as MBF1M332A-AS system NOTICE: Initializing DDR at mss[0]=0x18000000 NOTICE: No SPD detected on MSS0 DIMM0 NOTICE: No SPD detected on MSS0 DIMM1 NOTICE: No memory present on MSS 0 NOTICE: Doing MSS idle operations on MSS 0 NOTICE: Initializing DDR at mss[1]=0x20000000 NOTICE: No SPD detected on MSS1 DIMM0 NOTICE: No SPD detected on MSS1 DIMM1 NOTICE: Finished initializing DDR on MSS 1! NOTICE: DDR POST passed. NOTICE: BL1: Booting BL31 NOTICE: BL31: v1.5(release):2.5.1-0-gbe0dd6b NOTICE: BL31: Built : 23:42:29, Apr 2 2020 NOTICE: BL31 built for hw (ver 0) UEFI firmware (version BlueField:2.5.1-0-ga9be8ec built at 23:43:44 on Apr 2 2020) But it still hangs, will continue checking... -- You are receiving this mail because: You are on the CC list for the bug. You are always notified about changes to this product and component _______________________________________________ package-review mailing list -- package-review@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to package-review-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/package-review@xxxxxxxxxxxxxxxxxxxxxxx