Re: [PATCH v5 01/10] hyperv: Convert Hyper-V status codes to strings

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2/28/2025 9:20 AM, Roman Kisel wrote:
> 
> 
> On 2/27/2025 3:25 PM, Easwar Hariharan wrote:
>> On 2/27/2025 3:08 PM, Roman Kisel wrote:
> 
> [...]
> 
>>> Would be great to learn the details to understand how this function is
>>> going to improve the situation:
>>>
>>> 1. How come the hex error code was useless, what is not matching
>>>     anything in the Linux headers?
>>
>> It doesn't match anything in the Linux headers, but it's an NTSTATUS, not HVSTATUS.
>>
> 
> That is what it looks like from the code, I posted the details in the
> parallel thread.
> 
> Here is a fix:
> https://lore.kernel.org/linux-hyperv/20250227233110.36596-1-romank@xxxxxxxxxxxxxxxxxxx/
> 
> Also I think the commit description in your patch
> 
> https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d2138eab8cde61e0e6f62d0713e45202e8457d6d
> 
> conflates the hypervisor (ours runs bare-metal, Type 1) and the VMMs
> (Virtual Machine Monitors)+VSPs (Virtual Service Providers, e.g StorVSP
> that implements SCSI) running in the host/root/dom0 partition.

Agreed, that was what I was led to believe, your patch would help with that
miscommunication, though not in its current form. See my review comment in that
thread.

> 
>> Coming from the PoV of a user, it would be a much more useful message to see:
>>
>> [  249.512760] hv_storvsc fd1d2cbd-ce7c-535c-966b-eb5f811c95f0: tag#683 cmd 0x28 status: scsi 0x2 srb 0x4 hv STATUS_UNSUCCESSFUL
>>
>> than
>>
>> [  249.512760] hv_storvsc fd1d2cbd-ce7c-535c-966b-eb5f811c95f0: tag#683 cmd 0x28 status: scsi 0x2 srb 0x4 hv 0xc0000001
>>
> 
> It is likely that the PoV of a user that you've mentioned is actually
> a PoV of a (kernel) developer.

Actually, no, it's PoV of the WSL users that are having the discussion in
the linked github issue. FWIW, that issue also occurred in Azure with multiple
incidents coming into our queue because of the unusable flood of error messages.

> It is hard to imagine that folks running
> web sites, DB servers, LoBs, LLMs, etc. in Hyper-V VMs care about the
> lowest software level of the virt stack in the form of the symbolic
> name or the hex code. They need their VMs to be reliable or suggest
> what the user may try if a configuration error is suspected.
> 
> To make the error log message useful to the user, the message should
> mention ways of remediation or at least hint what might've gotten
> wedged. Without that, that's only useful for the people who work with
> the kernel code proper or the kernel interface to the user land.

There's a step between seeing the issue and fixing it that you're missing,
i.e. the reporting.

An issue that says "flood of hv_storvsc errors reporting status
unsuccessful" is better than the same without that status information:
https://github.com/microsoft/WSL/issues/9173

> 
> So I'd think that the hex error codes from the hypervisor give the user
> exactly as much as the error symbolic names do to get the system to the
> desired state: nothing. 
I continue to disagree, seeing HV_STATUS_NO_RESOURCES is better than 0x1D,
because the user may think to look at `top` or `free -h` or similar to see
what could be killed to improve the situation.

> Even less when the error reported "Unknown" :)

I agree on the uselessness of "Unknown" to the user, except as already mentioned
below, as a prompt for the code to be updated.

> 
>>> 2. How having "Unknown" in the log can possibly be better?
>>
>> IMHO, seeing "Unknown" in an error report means that there's a new return value
>> that needs to be mapped to errno in hv_status_to_errno() and updated here as well.
>>
> 
> It means that to the developer. To the user, it means the developers
> messed something up and to make matters even worse they didn't leave any
> breadcrumbs (e.g. the hex code) to see what's wrong to help the user and
> themselves: there is just that "Unknown" thing in the log.

I think Nuno's compromise addresses this very well, to also print the hex code.

> 
>>> 3. Given that the select hv status codes and the proposed strings have
>>>     1:1 correspondence, and there is the 1:N catch-all case for the
>>>     "Unknown", how's that better?
>>>
>>
>> I didn't really follow this question, but I suppose the answer to Q2 answers this as
>> well. If not, please expand and I'll try to answer.
>>
> 
> Sorry about that chunk, hit "Send" without looking the e-mail over
> another time. Appreciate the discussion very much!
> 
> 
>> Thanks,
>> Easwar (he/him)
> 





[Index of Archives]     [Linux IBM ACPI]     [Linux Power Management]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]
  Powered by Linux