Re: SCSI target logs

Mike Christie <michael.christie@xxxxxxxxxx> · Sat, 20 May 2023 12:56:29 -0500

On 5/11/23 10:40 AM, Konstantin Shelekhin wrote:
> Hi,
> 
> For the last couple of days I've been thinking how to improve the target
> logs. My pain points:
> 
>   1. There is no common prefix for the subsystem
>   2. It's kinda hard to determine which I-T-L nexus was affected
> 
> For example:
> 
>   [Thu May 11 00:00:00 2023] Waiting for running cmds to complete.
> 
> Or something even more sinister:
> 
>   [Thu May 11 00:00:00 2023] bio error: 0xfffffffff, err 10
> 
> Only someone familiar with the kernel code can tell that this is
> actually target_core_iblock discovering an I/O error.
> 
> While all of this is not a blocker per se, because a skilled enough
> developer can connect all the dots, such analysis requires a lot of time
> and patience on large systems with hundreds of ACLs and several TPGs.
> 
> Okay, so what the others do? PCI and USB subsystems clearly enjoy having
> a unique ID:
> 
>   [Thu May 11 00:00:00 2023] iwlwifi 0000:00:14.3: WFPM_UMAC_PD_NOTIFICATION: 0x1f
>   [Thu May 11 00:00:00 2023] usb 3-7: reset high-speed USB device number 2 using xhci_hcd
> 
> Quick look at the lspci/lsusb output is all it takes to understand which
> device was affected.
> 
> Network interfaces enjoy having a user-friendly name:
> 
>   [Thu May 11 00:00:00 2023] wlan0: authenticate with cc:2d:e0:10:8d:b9
> 
> Same goes for file systems and block devices.
> 
> NVMe target uses pr_info(), benefiting from having neatly chosen module
> names:
> 
>   [Thu May 11 00:00:00 2023] nvmet: Foo happened (NVMe core)
>   [Thu May 11 00:00:00 2023] nvmet_fc: Bar happened (NVMe over FC)
>   [Thu May 11 00:00:00 2023] nvmet_tcp: Foo happened (NVMe overTCP)
> 
> However, we don't get to known which NVMe Host was affected unless the
> log line already contains the necessary info.
> 
> Back to the target. The simplest solution that at least gives the user
> some context is to put this in each C file:
> 
>   #define pr_fmt(fmt) KBUILD_MODENAME  ": " fmt
> 
> But since target modules all named differently, one can't just
> 
>   $ dmesg | grep target_
> 
> like with NVMe target. You'll have to do this:
> 
>   $ dmesg | grep -E '_target_|tcm_|isert|cxgbit'
> 
> That's because we have target_core_mod, isert, cxgbit, iscsi_target_mod,
> tcm_qla2xxx and so on. Yikes, but this is still an improvement. However,
> it does not give us any clue about the affected I-T-L nexus.
> 
> So, my proposal. We should give all modules similar names:
> 
>   tcm
>   tcm_iscsi
>   tcm_iser
>   tcm_qla2xxx
>   tcm_iblock
>   tcm_loop
>   tcm_cxgb

Do you mean you will rename the actual modules, or just have each module
have some define with a name like "tcm_$WHATEVER" that's used in the log
message?

I don't think we can change the actual name because scripts might do

modprobe isert

Unless there's a way to do some sort of automatic/magic alias then it
would be ok.

> 
> I would prefer scsit_ instead of tcm_, but I guess this ship has sailed.
> 
> Next, let's introduce some macros: tcm_LEVEL(), tcm_sess_LEVEL() and

We've been switching new code to the "target_" prefix.

> tcm_lun_LEVEL():
> 
>   tcm_err("This module blew up\n")
>   [Thu May 11 00:00:00 2023] tcm_iser: This module blew up
>                              ^^^^^^^^
>                              |
>                              Module name
> 
>   tcm_sess_err(&se_sess, "Waiting for running cmds to complete.\n")
>   [Thu May 11 00:00:00 2023] tcm_iscsi [iqn.2023-01.com.example:blah-blah -> 10]: Waiting for running cmds to complete.
>                              ^^^^^^^^^  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^    ^^
>                              |          |                                    |
>                              |          |                                    RTPI
>                              |          Initiator port
>                              Module name
> 
>   tcm_lun_err(&se_lun, "bio error %p, err: %d\n", bio, bio_status)
>   [Thu May 11 00:00:00 2023] tcm_iblock [iqn.2023-01.com.example:blah-blah -> 10/5]: bio error: 0xfffffffff, err 10
>                              ^^^^^^^^^^  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^    ^^ ^
>                              |           |                                    |  |
>                              |           |                                    |  MAPPED LUN
>                              |           |                                    RTPI
>                              |           Initiator port
>                              Module name

Nice.