On Mon, Nov 8, 2021 at 12:00 AM Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote: > > On Thu, Nov 04, 2021 at 12:40:32PM -0700, Yi Fan wrote: > > Reply inline. > > > > On Thu, Nov 4, 2021 at 11:56 AM Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote: > > > > > > On Thu, Nov 04, 2021 at 11:14:55AM -0700, Yi Fan wrote: > > > > Resend the email using plain text. > > > > > > > > I found some kernel performance regression issues that might be > > > > related w/ 4.14.y LTS commit. > > > > > > > > 4.14.y commit: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v4.14.253&id=27d185697322f9547bfd381c71252ce0bc1c0ee4 > > > > > > > > The issue is observed when "console=" is used as a kernel parameter to > > > > disable the kernel console. > > > > > > What exact "performance issue" are you seeing? > > > > > [YF] one kernel thread was randomly blocked for more than ~40 > > milliseconds, causing a certain task to fail to process in time. > > [YF] the issue is highly random on a single device. But it might > > happen a few times per 24 hours on a certain percentage of devices. > > The overall percentage of devices that show the issue seems quite > > stable over a long period of time (somehow the magic number is ~40%.). > > [YF] local test on a pool of devices does not show any correlation w/ > > any particular devices. > > [YF] local test after reverting the above single commit passes, no > > issue is observed. > > And what type of device is this? [YF] it happens on multiple devices on the 4.14.y kernel. (sorry cannot disclose the device type here.) > > If you see this thread: > https://lore.kernel.org/r/f19c18fd-20b3-b694-5448-7d899966a868@xxxxxxxxxxxx > it looks like chromeos devices have now disabled this change, and there > was a long discussion about possible issues and solutions. > > Can you try the patch set referenced in that thread to see if that > resolves the issue for you or not? Given that I have not seen any > reports of this being an issue since over a year ago, odds are it has > been resolved already with some change that we probably also need to > backport to 4.14.y. > > So any help in identifying that change would be appreciated. > [YF] thanks for the context. I did not find a clear patch that seems to solve this issue yet. [YF] for the time being, reverting the offending commit seems the safest solution for the 4.14.y. > > > And what kernel version are you seeing it on? > > > > > [YF] it was first found on some products w/ kernel version 4.14.210. > > through bisection, we located the commit on 4.14.200. > > > > > > I browsed android common kernel logs and the upstream stable kernel > > > > tree, found some related changes. > > > > > > > > printk: handle blank console arguments passed in. (link: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v5.14.15&id=3cffa06aeef7ece30f6b5ac0ea51f264e8fea4d0) > > > > Revert "init/console: Use ttynull as a fallback when there is no > > > > console" (link: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v5.14.15&id=a91bd6223ecd46addc71ee6fcd432206d39365d2) > > > > > > > > It looks like upstream also noticed the regression introduced by the > > > > commit, and the workaround is to use "ttynull" to handle "console=" > > > > case. But the "ttynull" was reverted due to some other reasons > > > > mentioned in the commit message. > > > > > > > > Any insight or recommendation will be appreciated. > > > > > > What problem exactly are you now seeing? And does it also happen on > > > 5.15? > > > > > [YF] we do not perform any tests on 5.15 yet. so no idea about whether > > the issue happens on 5.15. > > How about any other newer stable kernel version like 5.4.y or 5.10.y? > [YF] so far there is no easy way to replicate the issue. We have future products that are on 5.4.y and 5.10.y. I will keep monitoring whether similar issues are found. > thanks, > > greg k-h