On Thu, Apr 20, 2023 at 12:00:41AM +0800, Sui Jingfeng wrote: > Hi, > > Sorry about reply to you so late, > > our downstream (product kernel side) userspace GPU/DC driver > > has been tested out a few bugs, I'm asking to fulfill my duty to that part > all days. > > I may slow to reply, but I really love to reply. > > > On 2023/4/19 23:09, Daniel Vetter wrote: > > On Tue, 18 Apr 2023 at 20:16, Sui Jingfeng <15330273260@xxxxxx> wrote: > > > Hi, > > > > > > On 2023/4/19 01:52, Sui Jingfeng wrote: > > > > Hi, > > > > > > > > On 2023/4/18 16:32, Daniel Vetter wrote: > > > > > On Mon, Apr 17, 2023 at 07:32:19PM +0800, Sui Jingfeng wrote: > > > > > > The fbdev test of IGT may write after EOF, which lead to out-of-bound > > > > > > access for the drm drivers using fbdev-generic. For example, on a x86 > > > > > > + aspeed bmc card platform, with a 1680x1050 resolution display, > > > > > > running > > > > > > fbdev test if IGT will cause the linux kernel hang with the following > > > > > > call trace: > > > > > > > > > > > > Oops: 0000 [#1] PREEMPT SMP PTI > > > > > > [IGT] fbdev: starting subtest eof > > > > > > Workqueue: events drm_fb_helper_damage_work [drm_kms_helper] > > > > > > [IGT] fbdev: starting subtest nullptr > > > > > > > > > > > > RIP: 0010:memcpy_erms+0xa/0x20 > > > > > > RSP: 0018:ffffa17d40167d98 EFLAGS: 00010246 > > > > > > RAX: ffffa17d4eb7fa80 RBX: ffffa17d40e0aa80 RCX: 00000000000014c0 > > > > > > RDX: 0000000000001a40 RSI: ffffa17d40e0b000 RDI: ffffa17d4eb80000 > > > > > > RBP: ffffa17d40167e20 R08: 0000000000000000 R09: ffff89522ecff8c0 > > > > > > R10: ffffa17d4e4c5000 R11: 0000000000000000 R12: ffffa17d4eb7fa80 > > > > > > R13: 0000000000001a40 R14: 000000000000041a R15: ffffa17d40167e30 > > > > > > FS: 0000000000000000(0000) GS:ffff895257380000(0000) > > > > > > knlGS:0000000000000000 > > > > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > > > > CR2: ffffa17d40e0b000 CR3: 00000001eaeca006 CR4: 00000000001706e0 > > > > > > Call Trace: > > > > > > <TASK> > > > > > > ? drm_fbdev_generic_helper_fb_dirty+0x207/0x330 [drm_kms_helper] > > > > > > drm_fb_helper_damage_work+0x8f/0x170 [drm_kms_helper] > > > > > > process_one_work+0x21f/0x430 > > > > > > worker_thread+0x4e/0x3c0 > > > > > > ? __pfx_worker_thread+0x10/0x10 > > > > > > kthread+0xf4/0x120 > > > > > > ? __pfx_kthread+0x10/0x10 > > > > > > ret_from_fork+0x2c/0x50 > > > > > > </TASK> > > > > > > CR2: ffffa17d40e0b000 > > > > > > ---[ end trace 0000000000000000 ]--- > > > > > > > > > > > > The direct reason is that damage rectange computed by > > > > > > drm_fb_helper_memory_range_to_clip() does not guaranteed to be > > > > > > in-bound. > > > > > > It is already results in workaround code populate to elsewhere. Another > > > > > > reason is that exposing a larger buffer size than the actual needed > > > > > > help > > > > > > to trigger this bug intrinsic in drm_fb_helper_memory_range_to_clip(). > > > > > > > > > > > > Others fbdev emulation solutions write to the GEM buffer directly, they > > > > > > won't reproduce this bug because the .fb_dirty function callback do not > > > > > > being hooked, so no chance is given to > > > > > > drm_fb_helper_memory_range_to_clip() > > > > > > to generate a out-of-bound when drm_fb_helper_sys_write() is called. > > > > > > > > > > > > This patch break the trigger condition of this bug by shrinking the > > > > > > shadow > > > > > > buffer size to sizes->surface_height * buffer->fb->pitches[0]. > > > > > > > > > > > > Fixes: '8fbc9af55de0 ("drm/fbdev-generic: Set screen size to size of > > > > > > GEM > > > > > > buffer")' > > > > > > > > > > > > Signed-off-by: Sui Jingfeng <suijingfeng@xxxxxxxxxxx> > > > > > > --- > > > > > > drivers/gpu/drm/drm_fbdev_generic.c | 2 +- > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > > > diff --git a/drivers/gpu/drm/drm_fbdev_generic.c > > > > > > b/drivers/gpu/drm/drm_fbdev_generic.c > > > > > > index 8e5148bf40bb..b057cfbba938 100644 > > > > > > --- a/drivers/gpu/drm/drm_fbdev_generic.c > > > > > > +++ b/drivers/gpu/drm/drm_fbdev_generic.c > > > > > > @@ -94,7 +94,7 @@ static int > > > > > > drm_fbdev_generic_helper_fb_probe(struct drm_fb_helper *fb_helper, > > > > > > fb_helper->buffer = buffer; > > > > > > fb_helper->fb = buffer->fb; > > > > > > - screen_size = buffer->gem->size; > > > > > > + screen_size = sizes->surface_height * buffer->fb->pitches[0]; > > > > > So I read core some more and stumbled over drm_fb_helper_deferred_io(). > > > > > Which has all the code and comments about this, including limiting. > > > > > > > > > > I think it would be clearer if we fix the issue there, instead of > > > > > passing > > > > > limits around in obscure places that then again get broken? > > > > No, it is more obscure doing that way... > > > > > > > > > > > > As the size of the shadow screen buffer will be exposed to userspace. > > > > > > > > The size 'helper->fb->height * helper->fb->pitches[0]' is a > > > > exactly(best) fit, > > > > > > > > You are guaranteed to waste at lease one byte by increasing one byte, > > > > > > > > and can not store all pixels by decreasing one byte (In the case where > > > > `helper->fb->pitches[0] = helper->fb->width * 4`). > > > > > > > > It implicitly tell the userspace do not go beyond that boundary. > > > > > > > > although userspace program can still choose to write after EOF, > > > > > > > > But it is for test purpose, to test the kernel if it can return a > > > > -EFBIG or not. > > > > > > > > > The thing is, > > > > > Thomas both authored the limit checks in drm_fb_helper_deferred_io() and > > > > > the patch which broken them again, so clearly this isn't very > > > > > obvious. I'm > > > > > thinking of something like this: > > > > > > > > > > > > > > > diff --git a/drivers/gpu/drm/drm_fb_helper.c > > > > > b/drivers/gpu/drm/drm_fb_helper.c > > > > > index ef4eb8b12766..726dab67c359 100644 > > > > > --- a/drivers/gpu/drm/drm_fb_helper.c > > > > > +++ b/drivers/gpu/drm/drm_fb_helper.c > > > > > @@ -697,10 +697,7 @@ void drm_fb_helper_deferred_io(struct fb_info > > > > > *info, struct list_head *pagerefli > > > > > * of the screen and account for non-existing scanlines. Hence, > > > > > * keep the covered memory area within the screen buffer. > > > > > */ > > > > > - if (info->screen_size) > > > > > - total_size = info->screen_size; > > > > > - else > > > > > - total_size = info->fix.smem_len; > > > > > + total_size = helper->fb->height * helper->fb->pitches[0]; > > > > This is just to mitigate the mistakes already has been made, > > > > > > > > because it do not do a good splitting between the *clip* part and the > > > > *damage update* part. > > > > > > > > An ideal clipping do not obscure its updating backend with a > > > > out-of-bound damage rectangle. > > > > > > > > Why did the drm_fb_helper_memory_range_to_clip() can not do a good job > > > > in all case > > > > > > > > to pass its backend a always meaningful damage rect ? > > > > > > > > > max_off = min(max_off, total_size); > > > > > if (min_off < max_off) { > > > > > > > > > > > > > > > I think that would make it utmost clear on what we're doing and why. > > > > > Otherwise we're just going to re-create the same bug again, like we've > > > > > done already :-) > > > > No, we create no bugs, we fix one. > > > > > > > > Thanks. > > > > > > > But honestly I do not have strong feel toward this, I just type what I'm > > > understand without seeing you resend a V3. > > > > > > It's OK in overall, I will help to test this tomorrow. :-) > > Apologies for making you jump around all the time and doing different > > versions of the same bugfix :-/ > > No, I do not mind. I'm wondering if you are testing me. Nah I'm really not any clearer on this than you :-/ > > I think this one here is ok to merge, > > NO, to be honest, this version is not ok. > > I have just tested it on LoongArch . It does not prevent out-of-bound on > LoongArch. > > bellow is the call trace when running the fbdev test of IGT. > > > [ 369.628841] Console: switching to colour dummy device 80x25 > [ 369.634440] [IGT] fbdev: executing > [ 369.654684] [IGT] fbdev: starting subtest info > [ 369.659173] [IGT] fbdev: starting subtest pan > [ 369.722093] [IGT] fbdev: starting subtest read > [ 369.737795] [IGT] fbdev: starting subtest unaligned-read > [ 369.745695] [IGT] fbdev: starting subtest write > [ 369.753154] CPU 3 Unable to handle kernel paging request at virtual address ffff800034bd0000, era == 9000000000223d5c, ra == ffff8000020d75a8 > [ 369.774570] [IGT] fbdev: starting subtest unaligned-write > [ 369.779960] Oops[#1]: > [ 369.782215] CPU: 3 PID: 504 Comm: kworker/3:3 Not tainted 6.3.0-rc5+ #377 > [ 369.782219] Hardware name: Loongson Loongson-3A5000-HV-7A2000-1w-V0.1-EVB/Loongson-LS3A5000-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05383-beta10 1 > [ 369.782223] Workqueue: events drm_fb_helper_damage_work [drm_kms_helper] > [ 369.802706] $ 0 : 0000000000000000 ffff8000020d75a8 90000001367fc000 90000001367ffcf0 > [ 369.809368] $ 4 : ffff80003a3ec000 ffff800034bcee00 0000000000001e00 9000000001160000 > [ 369.825275] $ 8 : ffff80003a3ebfff 0000000000000040 0000000000000000 ffff80000214f20c > [ 369.825280] $12 : ffff80003a3ecc00 ffff800034bd0000 0000000000000000 0000000000001e00 > [ 369.841188] $16 : 9000000107490800 9000000001a24000 9000000001a28000 00000e00207ec000 > [ 369.849152] $20 : fffffffffe000000 900000010684cec0 0000000000000439 90000001367ffd90 > [ 369.849165] $24 : 9000000106bc5200 ffff8000020de000 ffff800034bcee00 0000000000001e00 > [ 369.865072] $28 : 90000001076d4400 0000000000000001 900000010553c000 ffff80003a3eae00 > [ 369.865078] era : 9000000000223d5c __memcpy_toio+0x4c/0x90 > [ 369.878651] ra : ffff8000020d75a8 drm_fbdev_generic_helper_fb_dirty+0x1cc/0x300 [drm_kms_helper] > [ 369.878762] [IGT] fbdev: starting subtest eof > [ 369.887679] CSR crmd: 000000b0 > [ 369.887680] CSR prmd: 00000004 > [ 369.887682] CSR euen: 00000000 > [ 369.887683] CSR ecfg: 00071c1c > [ 369.901340] CSR estat: 00010000 > [ 369.901342] ExcCode : 1 (SubCode 0) > [ 369.901344] BadVA : ffff800034bd0000 > [ 369.914659] PrId : 0014c011 (Loongson-64bit) > [ 369.914660] Modules linked in: uas usb_storage snd_seq_dummy snd_seq snd_seq_device ip_set rfkill nf_tables nfnetlink vfat fat loongson ttm acpi_ipmi drm_kms_helper syscopyarea sysfillrect ipmi_si ipmi_devintf sysimgblt ipmi_msghandler fuse efivarfs > [ 369.919013] Process kworker/3:3 (pid: 504, threadinfo=00000000a1234af0, task=000000004e2cde6f) > [ 369.949519] Stack : 9000000106bc5318 900000010684cec0 9000000107afd470 ffff800039c00000 > [ 369.949535] [IGT] fbdev: starting subtest nullptr > [ 369.957476] 0000000000000001 0000000000000000 0000000000000000 76e0ff420f8eaeab > [ 369.957481] 9000000001712000 fffffffffffffffb 9000000100161080 0000000000000003 > [ 369.970105] 900000000980ba05 900000000980ba00 9000000106bc5200 9000000106bc52d4 > [ 369.978062] ffff8000020de000 9000000106bc52d8 9000000110664b40 ffff8000020d78e4 > [ 369.986018] 043a078000000000 76e0ff420f8eaeab 900000000980be00 9000000009807400 > [ 370.001926] 0000000000000000 9000000110664b40 9000000106bc52d8 9000000000256dbc > [ 370.001931] 900000000170e000 9000000107afce00 0000000000000001 900000000170e000 > [ 370.009888] 9000000009807428 9000000110664b70 9000000009807400 900000000025737c > [ 370.025797] 9000000100161080 9000000001148080 9000000107afce00 0000000000000003 > [ 370.026543] [IGT] fbdev: exiting, ret=0 > [ 370.033753] ... > [ 370.033756] Call Trace: > [ 370.033757] [<9000000000223d5c>] __memcpy_toio+0x4c/0x90 > [ 370.047681] [<ffff8000020d75a8>] drm_fbdev_generic_helper_fb_dirty+0x1cc/0x300 [drm_kms_helper] > [ 370.056354] [<ffff8000020d78e4>] drm_fb_helper_damage_work+0xa4/0x1d0 [drm_kms_helper] > [ 370.056381] [<9000000000256dbc>] process_one_work+0x1ec/0x35c > [ 370.056385] [<900000000025737c>] worker_thread+0x88/0x428 > [ 370.056387] [<900000000025f4bc>] kthread+0x114/0x120 > [ 370.056392] [<90000000002215a8>] ret_from_kernel_thread+0xc/0xa4 > [ 370.056395] > [ 370.056396] Code: 00410def 0010bc8c 001500ad <260001ae> 02c02084 02c021ad 29ffe08e 5ffff184 03401cc6 > [ 370.056406] > [ 370.056421] fbcon_init: detected unhandled fb_set_par error, error code -16 > [ 370.056482] ---[ end trace 0000000000000000 ]--- > [ 370.066620] Console: switching to colour frame buffer device 240x67 > > > > I just thought when looking at > > the history that we revert the exact patch without any other changes > > or comments, > Other part of that patch(except this line) may still useful, at least for > cleanup purpose. > > and usually that means someone will come up with the same > > cleanup idea again, and then we'll have a bug again. So maybe a > > comment or a WARN_ON or something else would be good. > > A WARN_ON is acceptable. > > > I guess we could also do your patch, but put a WARN_ON that the > > computed total_size is never bigger than the drm_fb size into > > drm_fb_helper_deferred_io()? That would also make sure that this bug > > doesn't get resurrected again. > > Best to merge V2 [1] of this series, that is what I am really fixed. > > Maybe somebody can help to refine it, to add a better description about this > question and so on. > > [1] https://patchwork.freedesktop.org/patch/532143/?series=116454&rev=1 Ok, I guess this really is the safest one. For that patch, do we need the change to screen_size in drm_fbdev_generic_helper_fb_probe()? I'm still not entirely clear. If it works without that change I think that's clearer for a minimal bugfix, if so can you send that out as v4 please? Also please Cc: Geert on whatever you're resubmitting, so he can test too and we can make sure it's still fixing the shmob issue he's seeing. And finally please include a link to this discussion here with a note that just reverting the screen_size changes is not enough: https://lore.kernel.org/dri-devel/ad44df29-3241-0d9e-e708-b0338bf3c623@xxxxxx/ Thanks a lot! -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch