On 2/27/2025 3:25 PM, Easwar Hariharan wrote:
On 2/27/2025 3:08 PM, Roman Kisel wrote:
[...]
Would be great to learn the details to understand how this function is
going to improve the situation:
1. How come the hex error code was useless, what is not matching
anything in the Linux headers?
It doesn't match anything in the Linux headers, but it's an NTSTATUS, not HVSTATUS.
That is what it looks like from the code, I posted the details in the
parallel thread.
Here is a fix:
https://lore.kernel.org/linux-hyperv/20250227233110.36596-1-romank@xxxxxxxxxxxxxxxxxxx/
Also I think the commit description in your patch
https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d2138eab8cde61e0e6f62d0713e45202e8457d6d
conflates the hypervisor (ours runs bare-metal, Type 1) and the VMMs
(Virtual Machine Monitors)+VSPs (Virtual Service Providers, e.g StorVSP
that implements SCSI) running in the host/root/dom0 partition.
Coming from the PoV of a user, it would be a much more useful message to see:
[ 249.512760] hv_storvsc fd1d2cbd-ce7c-535c-966b-eb5f811c95f0: tag#683 cmd 0x28 status: scsi 0x2 srb 0x4 hv STATUS_UNSUCCESSFUL
than
[ 249.512760] hv_storvsc fd1d2cbd-ce7c-535c-966b-eb5f811c95f0: tag#683 cmd 0x28 status: scsi 0x2 srb 0x4 hv 0xc0000001
It is likely that the PoV of a user that you've mentioned is actually
a PoV of a (kernel) developer. It is hard to imagine that folks running
web sites, DB servers, LoBs, LLMs, etc. in Hyper-V VMs care about the
lowest software level of the virt stack in the form of the symbolic
name or the hex code. They need their VMs to be reliable or suggest
what the user may try if a configuration error is suspected.
To make the error log message useful to the user, the message should
mention ways of remediation or at least hint what might've gotten
wedged. Without that, that's only useful for the people who work with
the kernel code proper or the kernel interface to the user land.
So I'd think that the hex error codes from the hypervisor give the user
exactly as much as the error symbolic names do to get the system to the
desired state: nothing. Even less when the error reported "Unknown" :)
2. How having "Unknown" in the log can possibly be better?
IMHO, seeing "Unknown" in an error report means that there's a new return value
that needs to be mapped to errno in hv_status_to_errno() and updated here as well.
It means that to the developer. To the user, it means the developers
messed something up and to make matters even worse they didn't leave any
breadcrumbs (e.g. the hex code) to see what's wrong to help the user and
themselves: there is just that "Unknown" thing in the log.
3. Given that the select hv status codes and the proposed strings have
1:1 correspondence, and there is the 1:N catch-all case for the
"Unknown", how's that better?
I didn't really follow this question, but I suppose the answer to Q2 answers this as
well. If not, please expand and I'll try to answer.
Sorry about that chunk, hit "Send" without looking the e-mail over
another time. Appreciate the discussion very much!
Thanks,
Easwar (he/him)
--
Thank you,
Roman