Re: [Syzkaller & bisect] There is "soft lockup in __cleanup_mnt" in v6.4-rc3 kernel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/26/23 13:54, Theodore Ts'o wrote:
> On Fri, May 26, 2023 at 10:42:55AM -0700, Dave Hansen wrote:
>>
>>> If Intel feels that it's useful to run their own instance, maybe
>>> there's some way you can work with Google syzkaller team so you don't
>>> have to do that?
>>
>> I actually don't know why or when Intel started doing this.  0day in
>> general runs on a pretty diverse set of systems and I suspect this was
>> an attempt to leverage that.  Philip, do you know the history here?
> 
> Yeah, I think that's at least part of the issue.  Looking at some of
> the reports that, the reported architecture was Tiger Lake and Adler
> Lake.  According to Pengfei, part of this was to test features that
> require newer cpu features, such as CET / Shadow Stack.  Now, I could
> be wrong, because Intel's CPU naming scheme is too complex for my tiny
> brain and makes my head spin.  It's really hard to map the names used
> for mobile processors to those used by Xeon server class platforms,
> but I *think*, if Intel's Product Managers haven't confused me
> hopelessly, Google Cloud's C3 VM's, which use Sapphire Rapids, should
> have those hardware features which are in Tiger Lake and Adler Lake,
> while the Google Cloud's N2 VM's, which use Ice Lake processors, are
> too old.  Can someone confirm if I got that right?

That's roughly right.  *But*, there are things that got removed from
Tiger->Alder Lake like AVX-512 and things that the Xeons have that the
client CPUs don't, like SGX.

Shadow stacks are definitely one of the things that got added from Ice
Lake => Sapphire Rapids.

But like you mentioned below, I don't see any actual evidence that
"newer" hardware is implicated here at all.

> So this might be an issue of Intel submitting the relevant syzkaller
> commits that add support for testing Shadow Stack, CET, IOMMUFD, etc.,
> where needed to the upstream syzkaller git repo --- and then
> convincing the Google Syzkaller team to turn up run some of test VM's
> on the much more expensive (per CPU/hour) C3 VM's.  The former is
> probably something that is just a matter of standard open source
> upstreaming.  The latter might be more complicated, and might require
> some private negotiations between companies to address the cost
> differential and availability of C3 VM's.

Yeah, absolutely.

If Intel keeps up with its own instance of syzkaller, Intel should
constantly be asking itself why the Google instance isn't hitting the
same bugs and how we can close the gap if there is one.

> The other thing that's probably worth considering here is that
> hopefully many of these reports are one that aren't *actually*
> architecture dependent, but for some reason, are just results that one
> syzkaller's instance has found, but another syzkaller instance has not
> yet found.  So perhaps there can be some kind of syzkaller state
> export/import scheme so that a report that be transferred from one
> syzkaller instance to another.  That way, upstream developers would
> have a single syzkaller dashboard to pay attention to, get regular
> information about how often a particular report is getting triggered,
> and if the information behind the report can get fed into receiving
> syzkaller's instance's fuzzing seed library, it might improve the test
> coverage for other kernels that Intel doesn't have the business case
> to test (e.g., Android kernels, kernels compiled for arm64 and RISC-V,
> etc.)

Absolutely, a unified view of all of the instances would be really nice.

> After all, looking at the report which kicked off this thread ("soft
> lockup in __cleanup_mnt"), I don't think this is something that should
> be hardware specific; and yet, this report appears not to exist in
> Google's syzkaller instance.  If we could import the fuzzing seed for
> this and similar reports into Google's syzkaller instance, it seems to
> me that this would be a Good Thing.

Very true.  I don't see anything obviously Intel-specific here.  One of
the first questions we should be asking ourselves is why _we_ hit this
and Google didn't.



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux