Re: [PATCH] drm/amdgpu: fix sysfs_emit/sysfs_emit_at warnings

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 9/8/2021 3:08 PM, Christian König wrote:
Am 08.09.21 um 11:29 schrieb Lazar, Lijo:


On 9/8/2021 2:32 PM, Yu, Lang wrote:
[AMD Official Use Only]



-----Original Message-----
From: Lazar, Lijo <Lijo.Lazar@xxxxxxx>
Sent: Wednesday, September 8, 2021 4:55 PM
To: Yu, Lang <Lang.Yu@xxxxxxx>; Christian König
<ckoenig.leichtzumerken@xxxxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Cc: Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Huang, Ray
<Ray.Huang@xxxxxxx>; Tian Tao <tiantao6@xxxxxxxxxxxxx>
Subject: Re: [PATCH] drm/amdgpu: fix sysfs_emit/sysfs_emit_at warnings



On 9/8/2021 1:14 PM, Yu, Lang wrote:
[AMD Official Use Only]



-----Original Message-----
From: Lazar, Lijo <Lijo.Lazar@xxxxxxx>
Sent: Wednesday, September 8, 2021 3:36 PM
To: Christian König <ckoenig.leichtzumerken@xxxxxxxxx>; Yu, Lang
<Lang.Yu@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Cc: Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Huang, Ray
<Ray.Huang@xxxxxxx>; Tian Tao <tiantao6@xxxxxxxxxxxxx>
Subject: Re: [PATCH] drm/amdgpu: fix sysfs_emit/sysfs_emit_at
warnings



On 9/8/2021 12:07 PM, Christian König wrote:
Am 08.09.21 um 07:56 schrieb Lang Yu:
sysfs_emit and sysfs_emit_at requrie a page boundary aligned buf
address. Make them happy!

Warning Log:
[  492.545174] invalid sysfs_emit_at: buf:00000000f19bdfde at:0 [
492.546416] WARNING: CPU: 7 PID: 1304 at fs/sysfs/file.c:765
sysfs_emit_at+0x4a/0xa0
[  492.654805] Call Trace:
[  492.655353]  ? smu_cmn_get_metrics_table+0x40/0x50 [amdgpu] [
492.656780]  vangogh_print_clk_levels+0x369/0x410 [amdgpu] [
492.658245]  vangogh_common_print_clk_levels+0x77/0x80 [amdgpu] [
492.659733]  ? preempt_schedule_common+0x18/0x30 [ 492.660713]
smu_print_ppclk_levels+0x65/0x90 [amdgpu] [ 492.662107]
amdgpu_get_pp_od_clk_voltage+0x13d/0x190 [amdgpu] [ 492.663620]
dev_attr_show+0x1d/0x40

Mhm, that at least partially doesn't looks like the right approach to me.

Why do we have string printing and sysfs code in the hardware
version specific backend in the first place?


This is a callback meant for printing ASIC specific information to
sysfs node. The buffer passed in sysfs read is passed as it is to the callback API.

That stuff needs to be implemented for each hardware generation and
is now cluttered with sysfs buffer offset calculations.


Looks like the warning happened because of this usage.

                  size = amdgpu_dpm_print_clock_levels(adev, OD_SCLK, buf);                   size += amdgpu_dpm_print_clock_levels(adev, OD_MCLK,
buf+size);
                  size += amdgpu_dpm_print_clock_levels(adev,
OD_VDDC_CURVE, buf+size);
                  size += amdgpu_dpm_print_clock_levels(adev,
OD_VDDGFX_OFFSET, buf+size);
                  size += amdgpu_dpm_print_clock_levels(adev,
OD_RANGE,
buf+size);
                  size += amdgpu_dpm_print_clock_levels(adev, OD_CCLK,
buf+size);


[Yu, Lang]
Yes. So it is fine we just fix the caller amdgpu_get_pp_od_clk_voltage like
following:

static ssize_t amdgpu_get_pp_od_clk_voltage(struct device *dev,
        struct device_attribute *attr,
        char *buf)
{
    struct drm_device *ddev = dev_get_drvdata(dev);
    struct amdgpu_device *adev = drm_to_adev(ddev);
    ssize_t size, offset;
    int ret, i;
    char temp_buf[512];
    char clock_type[] = {OD_SCLK, OD_MCLK, OD_VDDC_CURVE,
                         OD_VDDGFX_OFFSET, OD_RANGE, OD_CCLK};

    if (amdgpu_in_reset(adev))
        return -EPERM;
    if (adev->in_suspend && !adev->in_runpm)
        return -EPERM;

    ret = pm_runtime_get_sync(ddev->dev);
    if (ret < 0) {
        pm_runtime_put_autosuspend(ddev->dev);
        return ret;
    }

    offset = 0;

    if (adev->powerplay.pp_funcs->print_clock_levels) {
        for (i = 0; i < ARRAY_SIZE(clock_type); i++) {
            size = amdgpu_dpm_print_clock_levels(adev,
clock_type[i], buf);
            if (offset + size > PAGE_SIZE)
                break;
            memcpy(temp_buf + offset, buf, size);
            offset += size;
        }
        memcpy(buf, temp_buf, offset);
        size = offset;
    } else {
        size = sysfs_emit(buf, "\n");
    }
    pm_runtime_mark_last_busy(ddev->dev);
    pm_runtime_put_autosuspend(ddev->dev);

    return size;
}

Prefer to avoid any extra stack or heap usage for buffer. Maybe another arg to
pass offset along with buf?

[Yu, Lang]
Actually, the buf address contains the offset(offset_in_page(buf)) .

Though it's not a problem based on codeflow, static analysis tools might complain.

Or we just rollback to sprintf/snprintf.


snprintf with (PAGE_SIZE-size) may be simpler. I think Darren took the effort to convert these, he may have some other ideas.

This is not what I meant. See from the design point of view the print_clock_levels() callback is the bad idea to begin with.

What we should have instead is a callback which returns the clock as a value which is then printed in the amdgpu_get_pp_od_clk_voltage() function.

This avoids passing around the buffer and remaining size everywhere and also guarantees that the sysfs have unified printing over all hardware generations.


The scenario is one node used for multiple parameters - OD_SCLK, OD_CCLK, OD_VDDGFX_OFFSET etc.(mostly to avoid cluttering sysfs with lots of nodes). On top of it, the parameters supported (for ex: CCLK is not valid on dGPUs), the number of levels supported etc. vary across ASICs. There has to be multiple calls or the call needs to return multiple values for a single parameter (for ex: up to 4, 8 or 16 levels of GFXCLK depending on ASIC).

I don't know the history of the callback, mostly it was considered more efficient to print it directly rather than fetch and print. Alex/Evan may know the details.

Thanks,
Lijo

Regards,
Christian.




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux