Hi Chris, > -----Original Message----- > From: Chris Blake [mailto:chrisrblake93@xxxxxxxxx] > Sent: Tuesday, October 31, 2017 12:50 PM > To: Daniel Jurgens <danielj@xxxxxxxxxxxx> > Cc: Leon Romanovsky <leon@xxxxxxxxxx>; Jason Gunthorpe <jgg@xxxxxxxx>; > Hal Rosenstock <hal@xxxxxxxxxxxxxxxxxx>; Parav Pandit > <parav@xxxxxxxxxxxx>; linux-rdma@xxxxxxxxxxxxxxx; Hal Rosenstock > <hal@xxxxxxxxxxxx>; Ira Weiny <ira.weiny@xxxxxxxxx> > Subject: Re: 4.13 ib_mthca NULL pointer dereference with OpenSM > > On Tue, Oct 31, 2017 at 10:20 AM, Daniel Jurgens <danielj@xxxxxxxxxxxx> > wrote: > > On 10/31/2017 10:15 AM, Leon Romanovsky wrote: > >> On Tue, Oct 31, 2017 at 09:09:01AM -0600, Jason Gunthorpe wrote: > >>> On Tue, Oct 31, 2017 at 10:01:49AM -0500, Daniel Jurgens wrote: > >>> > >>>>>> Adding the new return sure makes alot of sense as well.. > >>>>>> > >>>>>> Hal, Ira, would you check this routine too? kernel oops's are bad.. > >>>>> Patch looks needed for just the point that Parav made above (that > >>>>> if security check fails, then ib_free_recv_mad will cause the > >>>>> mad_recv_wc->rmpp_list to be accessed so it needs to be > >>>>> initialized before security is enforced). > >>>> Agree the patch is needed regardless. > >>> Someone please send it.. > >> Parav/Daniel, > >> > >> Please send it directly to the mailing list. > >> > >>>>> I don't have mthca to try this. Maybe Chris can try this patch > >>>>> (with CONFIG_SECURITY_INFINIBAND=y). > >>>> Chris, are you running with SELinux enabled? If this addresses your issue it > means permission is denied, so once the crash is resolved additional policy will > be required in order for it to work as expected. > >>> If Chris has selinux turned on in his distro would you expect this > >>> test to just fail? Doesn't that mean we have missed installing > >>> security labels for things like opensm? > >> Chris has SELinux enabled, see his gist: > >> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgi > >> > st.github.com%2Friptidewave93%2Fb3b83c13e93ab3be4254c855885f5b3a&dat > a > >> > =02%7C01%7Cparav%40mellanox.com%7Cf968df072f7143ca1aa708d52087c6b > 8%7C > >> > a652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636450689891416237&sd > ata=W > >> 74Qn4lOs2MBgV3uHPSngmvtLCxp%2F8kDkyZ1cIIhGQ0%3D&reserved=0 > > > > That doesn't indicate if he has SELinux enabled or not, just that > CONFIG_SECURITY_INFINIBAND is enabled. Also, even if SELinux enabled in the > kernel config it must be turned on via /etc/selinux/config, and also set into > enforcing mode, if it were to cause this problem. There's no enough info there > to determine any of that. > > > >> Thanks > >> > >>> Jason > > > > > > Hello All, > > I have installed the kernel with the mentioned patch, as well as > CONFIG_SECURITY_INFINIBAND enabled. Sadly I am back to the issue where my > compute node is reporting: > > kernel: infiniband mthca0: ib_post_send_mad error There were two issues in your report. 1. Post send failure. 2. Kernel crash on receiving mad. I provided fix for 2nd issue. Do you still 2nd issue (crash) after applying the fix? For 1st problem, Dan has few questions/suggestions regarding configuration of selinux policy. Can you please go through it? ��.n��������+%������w��{.n�����{���fk��ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f