On Fri, Aug 15, 2014 at 10:11 AM, Christian König <christian.koenig@xxxxxxx> wrote: > Hi Marco, > > the problem with an CS ioctl flag is that we sometimes don't know how much > SCLK/MCLK boost is needed, for example when we do post processing in the > player using OpenGL and UVD decoding with VDPAU. In this case VDPAU don't > has the slightest idea how high SCLK/MCLK must be and so can't give that > info to the kernel either. It could be as simple as just setting a flag when any kind of post processing is enabled so that the kernel selects the highest UVD performance state, otherwise, let the kernel decide based on streams. Alex > > Regards, > Christian. > > Am 15.08.2014 um 15:21 schrieb Marco Benatto: > > Hey all, > > I also had a talk with Alex yesterday about post-processing issues when > using dynamic UVD profiles and a chamge on CS ioctl > including a flag to let user mode driver tell to the kernel which > performance requirement it wants for post processing. A commom > point for both discussion is to stablish the default values for these > profiles, but probably this ioctl change would be more impacting/complex > to implement than a sysfs entry. > > If a sysfs entry is anough for now I can handle the code to create it and, > with your help, the code to setup the UVD profile requested through it. > > Is there any suggestion? > > Thanks all for your help, > > > On Fri, Aug 15, 2014 at 5:48 AM, Christian König <christian.koenig@xxxxxxx> > wrote: >> >> Hi guys, >> >> to make a long story short every time I watch a movie my laptop start to >> heat up because we always select the standard UVD power profile without >> actually measuring if that is necessary. >> >> Marco came up with a patch that seems to reliable measure the fps send >> down to the kernel and so together with knowing the frame size of the video >> should allow us to select the right UVD power profile. >> >> The problem is that Alex (unnoticed by me) completely disabled selecting >> the UVD profiles because of some issues with advanced post processing >> discussed on IRC. The problem seems to be that the lower UVD profiles have a >> to low SCLK/MCLK to handle the 3D load that comes with scaling, >> deinterlacing etc... >> >> I unfortunately don't have time for it, cause this only affects the >> hardware generations R600-SI and not the newest one CIK. So could you guys >> stick together and come up with a solution? Something like a sysfs entry >> that let's us select the minimum UVD power level allowed? >> >> I think Marco is happy to come up with a patch, we just need to know >> what's really needed and what should be the default values. I'm happy to >> review everything that comes out of it, just don't have time to do it >> myself. >> >> Happy discussion and thanks in advance, >> Christian. >> >> Am 12.08.2014 um 15:05 schrieb Alex Deucher: >> >>> On Tue, Aug 12, 2014 at 6:00 AM, Christian König >>> <deathsimple@xxxxxxxxxxx> wrote: >>>> >>>> Am 11.08.2014 um 16:52 schrieb Alex Deucher: >>>> >>>>> On Mon, Aug 11, 2014 at 5:08 AM, Christian König >>>>> <deathsimple@xxxxxxxxxxx> wrote: >>>>>> >>>>>> Am 07.08.2014 um 21:43 schrieb Alex Deucher: >>>>>> >>>>>>> On Thu, Aug 7, 2014 at 11:32 AM, Christian König >>>>>>> <deathsimple@xxxxxxxxxxx> wrote: >>>>>>>> >>>>>>>> Am 07.08.2014 um 16:32 schrieb Alex Deucher: >>>>>>>> >>>>>>>>> On Thu, Aug 7, 2014 at 7:33 AM, Christian König >>>>>>>>> <deathsimple@xxxxxxxxxxx> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> From: Marco A Benatto <marco.antonio.780@xxxxxxxxx> >>>>>>>>>> >>>>>>>>>> Adding a Frames Per Second estimation logic on UVD handles >>>>>>>>>> when it has being used. This estimation is per handle basis >>>>>>>>>> and will help on DPM profile calculation. >>>>>>>>>> >>>>>>>>>> v2 (chk): fix timestamp type, move functions around and >>>>>>>>>> cleanup code a bit. >>>>>>>>> >>>>>>>>> Will this really help much? I thought the problem was mainly due >>>>>>>>> to >>>>>>>>> sclk and mclk for post processing. >>>>>>>> >>>>>>>> >>>>>>>> It should at least handle the UVD side for upclocking when you get a >>>>>>>> lot >>>>>>>> of >>>>>>>> streams / fps. And at on my NI the patch seems to do exactly that. >>>>>>>> >>>>>>>> Switching sclk and mclk for post processing is a different task, and >>>>>>>> I >>>>>>>> actually have no idea what to do with them. >>>>>>> >>>>>>> At this point we always choose the plain UVD state anyway so this >>>>>>> patch would only take effect if we re-enabled the dynamic UVD state >>>>>>> selection. >>>>>> >>>>>> >>>>>> Hui? I thought we already re-enabled the dynamic UVD state selection, >>>>>> but >>>>>> double checking this I found it disabled again. >>>>>> >>>>>> What was the problem with that? Looks like I somehow missed the >>>>>> discussion >>>>>> around it. >>>>> >>>>> We did, but after doing so a number of people complained about a >>>>> regression on IRC because when apps like xmbc enabled post processing, >>>>> performance went down. >>>> >>>> >>>> That's strange, from my experience the different UVD performance states >>>> only >>>> affect UVDs dclk/vclk, not sclk/mclk. I need to get the DPM dumps to >>>> confirms this. >>>> >>> The sclks and mclks are usually different as well, especially on APUs. >>> I can send you some examples. >>> >>>> You not off hand remember who complained on IRC? Finding something in >>>> the >>>> IRC logs is like searching for a needle in a haystack. >>> >>> I don't remember off hand. I think zgreg was involved in some of the >>> discussions. >>> >>> Alex >>> >>>> Thanks, >>>> Christian. >>>> >>>> >>>>> Alex >>>>> >>>>> >>>>>> Christian. >>>>>> >>>>>> >>>>>>> For the post processing, we probably need a hint we can >>>>>>> pass to the driver in the CS ioctl to denote what state we need. >>>>>>> Although if we did that, this could would largely be moot. That >>>>>>> said, >>>>>>> newer asics support dynamic UVD clocks so we really only need >>>>>>> something like that for older asics and I guess VCE. >>>>>>> >>>>>>> Alex >>>>>>> >>>>>>>> Christian. >>>>>>>> >>>>>>>> >>>>>>>>> Alex >>>>>>>>> >>>>>>>>>> Signed-off-by: Marco A Benatto <marco.antonio.780@xxxxxxxxx> >>>>>>>>>> Signed-off-by: Christian König <christian.koenig@xxxxxxx> >>>>>>>>>> --- >>>>>>>>>> drivers/gpu/drm/radeon/radeon.h | 10 ++++++ >>>>>>>>>> drivers/gpu/drm/radeon/radeon_uvd.c | 64 >>>>>>>>>> +++++++++++++++++++++++++++++++++---- >>>>>>>>>> 2 files changed, 68 insertions(+), 6 deletions(-) >>>>>>>>>> >>>>>>>>>> diff --git a/drivers/gpu/drm/radeon/radeon.h >>>>>>>>>> b/drivers/gpu/drm/radeon/radeon.h >>>>>>>>>> index 9e1732e..e92f6cb 100644 >>>>>>>>>> --- a/drivers/gpu/drm/radeon/radeon.h >>>>>>>>>> +++ b/drivers/gpu/drm/radeon/radeon.h >>>>>>>>>> @@ -1617,6 +1617,15 @@ int radeon_pm_get_type_index(struct >>>>>>>>>> radeon_device >>>>>>>>>> *rdev, >>>>>>>>>> #define RADEON_UVD_STACK_SIZE (1024*1024) >>>>>>>>>> #define RADEON_UVD_HEAP_SIZE (1024*1024) >>>>>>>>>> >>>>>>>>>> +#define RADEON_UVD_FPS_EVENTS_MAX 8 >>>>>>>>>> +#define RADEON_UVD_DEFAULT_FPS 60 >>>>>>>>>> + >>>>>>>>>> +struct radeon_uvd_fps { >>>>>>>>>> + uint64_t timestamp; >>>>>>>>>> + uint8_t event_index; >>>>>>>>>> + uint8_t events[RADEON_UVD_FPS_EVENTS_MAX]; >>>>>>>>>> +}; >>>>>>>>>> + >>>>>>>>>> struct radeon_uvd { >>>>>>>>>> struct radeon_bo *vcpu_bo; >>>>>>>>>> void *cpu_addr; >>>>>>>>>> @@ -1626,6 +1635,7 @@ struct radeon_uvd { >>>>>>>>>> struct drm_file *filp[RADEON_MAX_UVD_HANDLES]; >>>>>>>>>> unsigned >>>>>>>>>> img_size[RADEON_MAX_UVD_HANDLES]; >>>>>>>>>> struct delayed_work idle_work; >>>>>>>>>> + struct radeon_uvd_fps fps_info[RADEON_MAX_UVD_HANDLES]; >>>>>>>>>> }; >>>>>>>>>> >>>>>>>>>> int radeon_uvd_init(struct radeon_device *rdev); >>>>>>>>>> diff --git a/drivers/gpu/drm/radeon/radeon_uvd.c >>>>>>>>>> b/drivers/gpu/drm/radeon/radeon_uvd.c >>>>>>>>>> index 6bf55ec..ef5667a 100644 >>>>>>>>>> --- a/drivers/gpu/drm/radeon/radeon_uvd.c >>>>>>>>>> +++ b/drivers/gpu/drm/radeon/radeon_uvd.c >>>>>>>>>> @@ -237,6 +237,51 @@ void radeon_uvd_force_into_uvd_segment(struct >>>>>>>>>> radeon_bo *rbo) >>>>>>>>>> rbo->placement.lpfn = (256 * 1024 * 1024) >> >>>>>>>>>> PAGE_SHIFT; >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> +static void radeon_uvd_fps_clear_events(struct radeon_device >>>>>>>>>> *rdev, >>>>>>>>>> int >>>>>>>>>> idx) >>>>>>>>>> +{ >>>>>>>>>> + struct radeon_uvd_fps *fps = &rdev->uvd.fps_info[idx]; >>>>>>>>>> + unsigned i; >>>>>>>>>> + >>>>>>>>>> + fps->timestamp = jiffies_64; >>>>>>>>>> + fps->event_index = 0; >>>>>>>>>> + for (i = 0; i < RADEON_UVD_FPS_EVENTS_MAX; i++) >>>>>>>>>> + fps->events[i] = 0; >>>>>>>>>> +} >>>>>>>>>> + >>>>>>>>>> +static void radeon_uvd_fps_note_event(struct radeon_device *rdev, >>>>>>>>>> int >>>>>>>>>> idx) >>>>>>>>>> +{ >>>>>>>>>> + struct radeon_uvd_fps *fps = &rdev->uvd.fps_info[idx]; >>>>>>>>>> + uint64_t timestamp = jiffies_64; >>>>>>>>>> + unsigned rate = 0; >>>>>>>>>> + >>>>>>>>>> + uint8_t index = fps->event_index++; >>>>>>>>>> + fps->event_index %= RADEON_UVD_FPS_EVENTS_MAX; >>>>>>>>>> + >>>>>>>>>> + rate = div64_u64(HZ, max(timestamp - fps->timestamp, >>>>>>>>>> 1ULL)); >>>>>>>>>> + >>>>>>>>>> + fps->timestamp = timestamp; >>>>>>>>>> + fps->events[index] = min(rate, 120u); >>>>>>>>>> +} >>>>>>>>>> + >>>>>>>>>> +static unsigned radeon_uvd_estimate_fps(struct radeon_device >>>>>>>>>> *rdev, >>>>>>>>>> int >>>>>>>>>> idx) >>>>>>>>>> +{ >>>>>>>>>> + struct radeon_uvd_fps *fps = &rdev->uvd.fps_info[idx]; >>>>>>>>>> + unsigned i, valid = 0, count = 0; >>>>>>>>>> + >>>>>>>>>> + for (i = 0; i < RADEON_UVD_FPS_EVENTS_MAX; i++) { >>>>>>>>>> + /* We should ignore zero values */ >>>>>>>>>> + if (fps->events[i] != 0) { >>>>>>>>>> + count += fps->events[i]; >>>>>>>>>> + valid++; >>>>>>>>>> + } >>>>>>>>>> + } >>>>>>>>>> + >>>>>>>>>> + if (valid > 0) >>>>>>>>>> + return count / valid; >>>>>>>>>> + else >>>>>>>>>> + return RADEON_UVD_DEFAULT_FPS; >>>>>>>>>> +} >>>>>>>>>> + >>>>>>>>>> void radeon_uvd_free_handles(struct radeon_device *rdev, >>>>>>>>>> struct >>>>>>>>>> drm_file *filp) >>>>>>>>>> { >>>>>>>>>> int i, r; >>>>>>>>>> @@ -419,8 +464,10 @@ static int radeon_uvd_cs_msg(struct >>>>>>>>>> radeon_cs_parser >>>>>>>>>> *p, struct radeon_bo *bo, >>>>>>>>>> >>>>>>>>>> /* create or decode, validate the handle */ >>>>>>>>>> for (i = 0; i < RADEON_MAX_UVD_HANDLES; ++i) { >>>>>>>>>> - if (atomic_read(&p->rdev->uvd.handles[i]) == >>>>>>>>>> handle) >>>>>>>>>> + if (atomic_read(&p->rdev->uvd.handles[i]) == >>>>>>>>>> handle) >>>>>>>>>> { >>>>>>>>>> + radeon_uvd_fps_note_event(p->rdev, i); >>>>>>>>>> return 0; >>>>>>>>>> + } >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> /* handle not found try to alloc a new one */ >>>>>>>>>> @@ -428,6 +475,7 @@ static int radeon_uvd_cs_msg(struct >>>>>>>>>> radeon_cs_parser >>>>>>>>>> *p, struct radeon_bo *bo, >>>>>>>>>> if (!atomic_cmpxchg(&p->rdev->uvd.handles[i], >>>>>>>>>> 0, >>>>>>>>>> handle)) { >>>>>>>>>> p->rdev->uvd.filp[i] = p->filp; >>>>>>>>>> p->rdev->uvd.img_size[i] = img_size; >>>>>>>>>> + radeon_uvd_fps_clear_events(p->rdev, i); >>>>>>>>>> return 0; >>>>>>>>>> } >>>>>>>>>> } >>>>>>>>>> @@ -763,7 +811,7 @@ int radeon_uvd_get_destroy_msg(struct >>>>>>>>>> radeon_device >>>>>>>>>> *rdev, int ring, >>>>>>>>>> static void radeon_uvd_count_handles(struct radeon_device >>>>>>>>>> *rdev, >>>>>>>>>> unsigned *sd, unsigned >>>>>>>>>> *hd) >>>>>>>>>> { >>>>>>>>>> - unsigned i; >>>>>>>>>> + unsigned i, fps_rate = 0; >>>>>>>>>> >>>>>>>>>> *sd = 0; >>>>>>>>>> *hd = 0; >>>>>>>>>> @@ -772,10 +820,13 @@ static void radeon_uvd_count_handles(struct >>>>>>>>>> radeon_device *rdev, >>>>>>>>>> if (!atomic_read(&rdev->uvd.handles[i])) >>>>>>>>>> continue; >>>>>>>>>> >>>>>>>>>> - if (rdev->uvd.img_size[i] >= 720*576) >>>>>>>>>> - ++(*hd); >>>>>>>>>> - else >>>>>>>>>> - ++(*sd); >>>>>>>>>> + fps_rate = radeon_uvd_estimate_fps(rdev, i); >>>>>>>>>> + >>>>>>>>>> + if (rdev->uvd.img_size[i] >= 720*576) { >>>>>>>>>> + (*hd) += fps_rate > 30 ? 1 : 2; >>>>>>>>>> + } else { >>>>>>>>>> + (*sd) += fps_rate > 30 ? 1 : 2; >>>>>>>>>> + } >>>>>>>>>> } >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> @@ -805,6 +856,7 @@ void radeon_uvd_note_usage(struct >>>>>>>>>> radeon_device >>>>>>>>>> *rdev) >>>>>>>>>> set_clocks &= >>>>>>>>>> schedule_delayed_work(&rdev->uvd.idle_work, >>>>>>>>>> >>>>>>>>>> msecs_to_jiffies(UVD_IDLE_TIMEOUT_MS)); >>>>>>>>>> >>>>>>>>>> + >>>>>>>>>> if ((rdev->pm.pm_method == PM_METHOD_DPM) && >>>>>>>>>> rdev->pm.dpm_enabled) { >>>>>>>>>> unsigned hd = 0, sd = 0; >>>>>>>>>> radeon_uvd_count_handles(rdev, &sd, &hd); >>>>>>>>>> -- >>>>>>>>>> 1.9.1 >>>>>>>>>> >> > > > > -- > Marco Antonio Benatto > Linux user ID: #506236 > > _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel