On Wed, Sep 4, 2019 at 1:15 PM Dave Airlie <airlied@xxxxxxxxx> wrote: > > On Wed, 4 Sep 2019 at 19:17, Daniel Vetter <daniel@xxxxxxxx> wrote: > > > > On Wed, Sep 4, 2019 at 10:35 AM Feng Tang <feng.tang@xxxxxxxxx> wrote: > > > > > > Hi Daniel, > > > > > > On Wed, Sep 04, 2019 at 10:11:11AM +0200, Daniel Vetter wrote: > > > > On Wed, Sep 4, 2019 at 8:53 AM Thomas Zimmermann <tzimmermann@xxxxxxx> wrote: > > > > > > > > > > Hi > > > > > > > > > > Am 04.09.19 um 08:27 schrieb Feng Tang: > > > > > >> Thank you for testing. But don't get too excited, because the patch > > > > > >> simulates a bug that was present in the original mgag200 code. A > > > > > >> significant number of frames are simply skipped. That is apparently the > > > > > >> reason why it's faster. > > > > > > > > > > > > Thanks for the detailed info, so the original code skips time-consuming > > > > > > work inside atomic context on purpose. Is there any space to optmise it? > > > > > > If 2 scheduled update worker are handled at almost same time, can one be > > > > > > skipped? > > > > > > > > > > To my knowledge, there's only one instance of the worker. Re-scheduling > > > > > the worker before a previous instance started, will not create a second > > > > > instance. The worker's instance will complete all pending updates. So in > > > > > some way, skipping workers already happens. > > > > > > > > So I think that the most often fbcon update from atomic context is the > > > > blinking cursor. If you disable that one you should be back to the old > > > > performance level I think, since just writing to dmesg is from process > > > > context, so shouldn't change. > > > > > > Hmm, then for the old driver, it should also do the most update in > > > non-atomic context? > > > > > > One other thing is, I profiled that updating a 3MB shadow buffer needs > > > 20 ms, which transfer to 150 MB/s bandwidth. Could it be related with > > > the cache setting of DRM shadow buffer? say the orginal code use a > > > cachable buffer? > > > > Hm, that would indicate the write-combining got broken somewhere. This > > should definitely be faster. Also we shouldn't transfer the hole > > thing, except when scrolling ... > > First rule of fbcon usage, you are always effectively scrolling. > > Also these devices might be on a PCIE 1x piece of wet string, not sure > if the numbers reflect that. pcie 1x 1.0 is 250MB/s, so yeah with a bit of inefficiency and overhead not entirely out of the question that 150MB/s is actually the hw limit. If it's really pcie 1x 1.0, no idea where to check that. Also might be worth to double-check that the gpu pci bar is listed as wc in debugfs/x86/pat_memtype_list. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel