On Fri, Aug 15, 2014 at 11:20 AM, Grigori Goronzy <greg@xxxxxxxxxxxx> wrote: > On 15.08.2014 16:11, Christian König wrote: >> Hi Marco, >> >> the problem with an CS ioctl flag is that we sometimes don't know how >> much SCLK/MCLK boost is needed, for example when we do post processing >> in the player using OpenGL and UVD decoding with VDPAU. In this case >> VDPAU don't has the slightest idea how high SCLK/MCLK must be and so >> can't give that info to the kernel either. >> > > Maybe it's an acceptable workaround to simply disable dynamic UVD state > selection in case the UVD states only have a single power level. That > will avoid the performance issues on affected systems, while still > allowing dynamic UVD states on systems that have a saner DPM table > setup. I think it is mosly older systems that suffer from this. > That is exactly what we do now. Alex > Best regards > Grigori > >> Regards, >> Christian. >> >> Am 15.08.2014 um 15:21 schrieb Marco Benatto: >>> Hey all, >>> >>> I also had a talk with Alex yesterday about post-processing issues >>> when using dynamic UVD profiles and a chamge on CS ioctl >>> including a flag to let user mode driver tell to the kernel which >>> performance requirement it wants for post processing. A commom >>> point for both discussion is to stablish the default values for these >>> profiles, but probably this ioctl change would be more impacting/complex >>> to implement than a sysfs entry. >>> >>> If a sysfs entry is anough for now I can handle the code to create it >>> and, with your help, the code to setup the UVD profile requested >>> through it. >>> >>> Is there any suggestion? >>> >>> Thanks all for your help, >>> >>> >>> On Fri, Aug 15, 2014 at 5:48 AM, Christian König >>> <christian.koenig@xxxxxxx <mailto:christian.koenig@xxxxxxx>> wrote: >>> >>> Hi guys, >>> >>> to make a long story short every time I watch a movie my laptop >>> start to heat up because we always select the standard UVD power >>> profile without actually measuring if that is necessary. >>> >>> Marco came up with a patch that seems to reliable measure the fps >>> send down to the kernel and so together with knowing the frame >>> size of the video should allow us to select the right UVD power >>> profile. >>> >>> The problem is that Alex (unnoticed by me) completely disabled >>> selecting the UVD profiles because of some issues with advanced >>> post processing discussed on IRC. The problem seems to be that the >>> lower UVD profiles have a to low SCLK/MCLK to handle the 3D load >>> that comes with scaling, deinterlacing etc... >>> >>> I unfortunately don't have time for it, cause this only affects >>> the hardware generations R600-SI and not the newest one CIK. So >>> could you guys stick together and come up with a solution? >>> Something like a sysfs entry that let's us select the minimum UVD >>> power level allowed? >>> >>> I think Marco is happy to come up with a patch, we just need to >>> know what's really needed and what should be the default values. >>> I'm happy to review everything that comes out of it, just don't >>> have time to do it myself. >>> >>> Happy discussion and thanks in advance, >>> Christian. >>> >>> Am 12.08.2014 um 15:05 schrieb Alex Deucher: >>> >>> On Tue, Aug 12, 2014 at 6:00 AM, Christian König >>> <deathsimple@xxxxxxxxxxx <mailto:deathsimple@xxxxxxxxxxx>> wrote: >>> >>> Am 11.08.2014 um 16:52 schrieb Alex Deucher: >>> >>> On Mon, Aug 11, 2014 at 5:08 AM, Christian König >>> <deathsimple@xxxxxxxxxxx >>> <mailto:deathsimple@xxxxxxxxxxx>> wrote: >>> >>> Am 07.08.2014 um 21:43 schrieb Alex Deucher: >>> >>> On Thu, Aug 7, 2014 at 11:32 AM, Christian König >>> <deathsimple@xxxxxxxxxxx >>> <mailto:deathsimple@xxxxxxxxxxx>> wrote: >>> >>> Am 07.08.2014 um 16:32 schrieb Alex Deucher: >>> >>> On Thu, Aug 7, 2014 at 7:33 AM, >>> Christian König >>> <deathsimple@xxxxxxxxxxx >>> <mailto:deathsimple@xxxxxxxxxxx>> >>> wrote: >>> >>> From: Marco A Benatto >>> <marco.antonio.780@xxxxxxxxx >>> <mailto:marco.antonio.780@xxxxxxxxx>> >>> >>> Adding a Frames Per Second >>> estimation logic on UVD handles >>> when it has being used. This >>> estimation is per handle basis >>> and will help on DPM profile >>> calculation. >>> >>> v2 (chk): fix timestamp type, move >>> functions around and >>> cleanup code a bit. >>> >>> Will this really help much? I thought >>> the problem was mainly due to >>> sclk and mclk for post processing. >>> >>> >>> It should at least handle the UVD side for >>> upclocking when you get a >>> lot >>> of >>> streams / fps. And at on my NI the patch >>> seems to do exactly that. >>> >>> Switching sclk and mclk for post >>> processing is a different task, and I >>> actually have no idea what to do with them. >>> >>> At this point we always choose the plain UVD >>> state anyway so this >>> patch would only take effect if we re-enabled >>> the dynamic UVD state >>> selection. >>> >>> >>> Hui? I thought we already re-enabled the dynamic >>> UVD state selection, but >>> double checking this I found it disabled again. >>> >>> What was the problem with that? Looks like I >>> somehow missed the >>> discussion >>> around it. >>> >>> We did, but after doing so a number of people >>> complained about a >>> regression on IRC because when apps like xmbc enabled >>> post processing, >>> performance went down. >>> >>> >>> That's strange, from my experience the different UVD >>> performance states only >>> affect UVDs dclk/vclk, not sclk/mclk. I need to get the >>> DPM dumps to >>> confirms this. >>> >>> The sclks and mclks are usually different as well, especially >>> on APUs. >>> I can send you some examples. >>> >>> You not off hand remember who complained on IRC? Finding >>> something in the >>> IRC logs is like searching for a needle in a haystack. >>> >>> I don't remember off hand. I think zgreg was involved in some >>> of the >>> discussions. >>> >>> Alex >>> >>> Thanks, >>> Christian. >>> >>> >>> Alex >>> >>> >>> Christian. >>> >>> >>> For the post processing, we probably need a >>> hint we can >>> pass to the driver in the CS ioctl to denote >>> what state we need. >>> Although if we did that, this could would >>> largely be moot. That said, >>> newer asics support dynamic UVD clocks so we >>> really only need >>> something like that for older asics and I >>> guess VCE. >>> >>> Alex >>> >>> Christian. >>> >>> >>> Alex >>> >>> Signed-off-by: Marco A Benatto >>> <marco.antonio.780@xxxxxxxxx >>> <mailto:marco.antonio.780@xxxxxxxxx>> >>> Signed-off-by: Christian König >>> <christian.koenig@xxxxxxx >>> <mailto:christian.koenig@xxxxxxx>> >>> --- >>> >>> drivers/gpu/drm/radeon/radeon.h >>> | 10 ++++++ >>> >>> drivers/gpu/drm/radeon/radeon_uvd.c >>> | 64 >>> +++++++++++++++++++++++++++++++++---- >>> 2 files changed, 68 >>> insertions(+), 6 deletions(-) >>> >>> diff --git >>> a/drivers/gpu/drm/radeon/radeon.h >>> b/drivers/gpu/drm/radeon/radeon.h >>> index 9e1732e..e92f6cb 100644 >>> --- a/drivers/gpu/drm/radeon/radeon.h >>> +++ b/drivers/gpu/drm/radeon/radeon.h >>> @@ -1617,6 +1617,15 @@ int >>> radeon_pm_get_type_index(struct >>> radeon_device >>> *rdev, >>> #define >>> RADEON_UVD_STACK_SIZE (1024*1024) >>> #define RADEON_UVD_HEAP_SIZE >>> (1024*1024) >>> >>> +#define RADEON_UVD_FPS_EVENTS_MAX 8 >>> +#define RADEON_UVD_DEFAULT_FPS 60 >>> + >>> +struct radeon_uvd_fps { >>> + uint64_t timestamp; >>> + uint8_t event_index; >>> + uint8_t >>> events[RADEON_UVD_FPS_EVENTS_MAX]; >>> +}; >>> + >>> struct radeon_uvd { >>> struct radeon_bo >>> *vcpu_bo; >>> void >>> *cpu_addr; >>> @@ -1626,6 +1635,7 @@ struct >>> radeon_uvd { >>> struct drm_file >>> *filp[RADEON_MAX_UVD_HANDLES]; >>> unsigned >>> img_size[RADEON_MAX_UVD_HANDLES]; >>> struct delayed_work >>> idle_work; >>> + struct radeon_uvd_fps >>> fps_info[RADEON_MAX_UVD_HANDLES]; >>> }; >>> >>> int radeon_uvd_init(struct >>> radeon_device *rdev); >>> diff --git >>> a/drivers/gpu/drm/radeon/radeon_uvd.c >>> b/drivers/gpu/drm/radeon/radeon_uvd.c >>> index 6bf55ec..ef5667a 100644 >>> --- >>> a/drivers/gpu/drm/radeon/radeon_uvd.c >>> +++ >>> b/drivers/gpu/drm/radeon/radeon_uvd.c >>> @@ -237,6 +237,51 @@ void >>> radeon_uvd_force_into_uvd_segment(struct >>> radeon_bo *rbo) >>> rbo->placement.lpfn = >>> (256 * 1024 * 1024) >> PAGE_SHIFT; >>> } >>> >>> +static void >>> radeon_uvd_fps_clear_events(struct >>> radeon_device *rdev, >>> int >>> idx) >>> +{ >>> + struct radeon_uvd_fps *fps >>> = &rdev->uvd.fps_info[idx]; >>> + unsigned i; >>> + >>> + fps->timestamp = jiffies_64; >>> + fps->event_index = 0; >>> + for (i = 0; i < >>> RADEON_UVD_FPS_EVENTS_MAX; i++) >>> + fps->events[i] = 0; >>> +} >>> + >>> +static void >>> radeon_uvd_fps_note_event(struct >>> radeon_device *rdev, >>> int >>> idx) >>> +{ >>> + struct radeon_uvd_fps *fps >>> = &rdev->uvd.fps_info[idx]; >>> + uint64_t timestamp = >>> jiffies_64; >>> + unsigned rate = 0; >>> + >>> + uint8_t index = >>> fps->event_index++; >>> + fps->event_index %= >>> RADEON_UVD_FPS_EVENTS_MAX; >>> + >>> + rate = div64_u64(HZ, >>> max(timestamp - fps->timestamp, >>> 1ULL)); >>> + >>> + fps->timestamp = timestamp; >>> + fps->events[index] = >>> min(rate, 120u); >>> +} >>> + >>> +static unsigned >>> radeon_uvd_estimate_fps(struct >>> radeon_device *rdev, >>> int >>> idx) >>> +{ >>> + struct radeon_uvd_fps *fps >>> = &rdev->uvd.fps_info[idx]; >>> + unsigned i, valid = 0, >>> count = 0; >>> + >>> + for (i = 0; i < >>> RADEON_UVD_FPS_EVENTS_MAX; i++) { >>> + /* We should >>> ignore zero values */ >>> + if (fps->events[i] >>> != 0) { >>> + count += >>> fps->events[i]; >>> + valid++; >>> + } >>> + } >>> + >>> + if (valid > 0) >>> + return count / valid; >>> + else >>> + return >>> RADEON_UVD_DEFAULT_FPS; >>> +} >>> + >>> void >>> radeon_uvd_free_handles(struct >>> radeon_device *rdev, struct >>> drm_file *filp) >>> { >>> int i, r; >>> @@ -419,8 +464,10 @@ static int >>> radeon_uvd_cs_msg(struct >>> radeon_cs_parser >>> *p, struct radeon_bo *bo, >>> >>> /* create or decode, >>> validate the handle */ >>> for (i = 0; i < >>> RADEON_MAX_UVD_HANDLES; ++i) { >>> - if >>> (atomic_read(&p->rdev->uvd.handles[i]) >>> == handle) >>> + if >>> (atomic_read(&p->rdev->uvd.handles[i]) >>> == handle) >>> { >>> + >>> radeon_uvd_fps_note_event(p->rdev, i); >>> return 0; >>> + } >>> } >>> >>> /* handle not found >>> try to alloc a new one */ >>> @@ -428,6 +475,7 @@ static int >>> radeon_uvd_cs_msg(struct >>> radeon_cs_parser >>> *p, struct radeon_bo *bo, >>> if >>> (!atomic_cmpxchg(&p->rdev->uvd.handles[i], >>> 0, >>> handle)) { >>> >>> p->rdev->uvd.filp[i] = p->filp; >>> >>> p->rdev->uvd.img_size[i] = img_size; >>> + >>> radeon_uvd_fps_clear_events(p->rdev, >>> i); >>> return 0; >>> } >>> } >>> @@ -763,7 +811,7 @@ int >>> radeon_uvd_get_destroy_msg(struct >>> radeon_device >>> *rdev, int ring, >>> static void >>> radeon_uvd_count_handles(struct >>> radeon_device *rdev, >>> >>> unsigned *sd, unsigned *hd) >>> { >>> - unsigned i; >>> + unsigned i, fps_rate = 0; >>> >>> *sd = 0; >>> *hd = 0; >>> @@ -772,10 +820,13 @@ static void >>> radeon_uvd_count_handles(struct >>> radeon_device *rdev, >>> if >>> (!atomic_read(&rdev->uvd.handles[i])) >>> continue; >>> >>> - if >>> (rdev->uvd.img_size[i] >= 720*576) >>> - ++(*hd); >>> - else >>> - ++(*sd); >>> + fps_rate = >>> radeon_uvd_estimate_fps(rdev, i); >>> + >>> + if >>> (rdev->uvd.img_size[i] >= 720*576) { >>> + (*hd) += >>> fps_rate > 30 ? 1 : 2; >>> + } else { >>> + (*sd) += >>> fps_rate > 30 ? 1 : 2; >>> + } >>> } >>> } >>> >>> @@ -805,6 +856,7 @@ void >>> radeon_uvd_note_usage(struct >>> radeon_device >>> *rdev) >>> set_clocks &= >>> schedule_delayed_work(&rdev->uvd.idle_work, >>> >>> msecs_to_jiffies(UVD_IDLE_TIMEOUT_MS)); >>> >>> + >>> if >>> ((rdev->pm.pm_method == >>> PM_METHOD_DPM) && >>> rdev->pm.dpm_enabled) { >>> unsigned hd = >>> 0, sd = 0; >>> >>> radeon_uvd_count_handles(rdev, >>> &sd, &hd); >>> -- >>> 1.9.1 >>> >>> >>> >>> >>> >>> -- >>> Marco Antonio Benatto >>> Linux user ID:#506236 >> > > _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel