On Tue, Mar 19, 2019 at 8:04 PM Paul Moore <paul@xxxxxxxxxxxxxx> wrote: > > On Tue, Mar 19, 2019 at 9:08 PM Todd Kjos <tkjos@xxxxxxxxxx> wrote: > > Paul, > > > > Looking at a snippet of the test output: > > > > Service Provider read_consumed: 8 > > Service Provider command: BR_NOOP > > Service Provider command: BR_FAILED_REPLY <-------------- the txn > > failed as expected. > > Manager read_consumed: 8 > > Manager command: BR_NOOP > > Manager command: BR_TRANSACTION_COMPLETE > > not ok 3 > > <--------- but for some reason didn't exit(103) > > # Failed test at ./test line 56. > > > > It looks like the test script expects test_binder to fail with > > exit(103) after processing the Server Provider commands. It's not > > clear why it didn't, since the return of a BR_FAILED_REPLY for that > > transaction should have executed this code (line 392 of > > test_binder.c): > > > > if (cmd == BR_FAILED_REPLY || > > cmd == BR_DEAD_REPLY || > > cmd == BR_DEAD_BINDER) { > > fprintf(stderr, > > "Failed response from Service Provider using Managers FD\n"); > > exit(103); > > } > > > > Could this be an issue with the test? At least it doesn't look like a > > transaction is succeeding when it shouldn't. > > Hi Todd, > > Thanks for looking into this further. Look a bit more at the test, it > appears that the code above (line 392) only comes into play if the > service provider is handling a BR_REPLY, but looking at the test > output it doesn't appear that this is the case. > > # runcon -t test_binder_provider_no_im_t ./test_binder -v -r 2 provider > Service Provider PID: 2095 Process context: > unconfined_u:unconfined_r:test_binder_provider_no_im_t:s0-s0:c0.c1023 > Service Provider sending transaction to Manager - ADD_TEST_SERVICE > Service Provider read_consumed: 48 > Service Provider command: BR_NOOP > Service Provider command: BR_INCREFS > Service Provider command: BR_ACQUIRE > Service Provider command: BR_TRANSACTION_COMPLETE > > Service Provider read_consumed: 8 > Service Provider command: BR_NOOP > Service Provider command: BR_FAILED_REPLY So, then it sounds like the test is not running properly and not really testing what it intends to test (which I guess is consistent with the fact that it triggered that BUG_ON -- the transaction is malformed and failing early). It doesn't look like there is a successful transaction that should have failed due to selinux policy -- it looks like there is an invalid transaction that probably fails earlier and doesn't return 103 (it probably returns 8 -- it would be useful if the test script displayed the exit value that was detected as a failure). I don't think there is much I can do on this now given the apparent flakiness, but I'm happy to help when there is a stable issue. I'll look a little more at the test code to see if I can spot what is wrong with the transaction. Can I add a "Tested-by: Paul Moore <paul@xxxxxxxxxxxxxx>" on my patch submission to fix the BUG_ON (the exact patch you tested) ? -Todd > > However, things get weird. In the course of writing this email I > booted into my 5.0.0-1.1.secnext kernel (which passed the binder test > earlier) and now that kernel is failing in the same way (the test > hasn't changed). This test system is a Fedora Rawhide system which is > updated before I make a test run and I'm wondering if there is some > other userspace component which may be affecting this ... ? I've now > tried this on two different, current Rawhide VMs, hosted on two > different systems and I'm seeing the same thing, so I don't think it's > a *bad* system/VM? > > > On Tue, Mar 19, 2019 at 5:15 PM Todd Kjos <tkjos@xxxxxxxxxx> wrote: > > > > > > [...] > > > > > > > > Is there a public dashboard where I can take a look at those binder failures? > > > > > > > > Not really. I send test results to a not-yet-publicized mailing list, > > > > but there is more detail in the GitHub issue below (my last comment > > > > has the verbose test output): > > > > > > > > * https://github.com/SELinuxProject/selinux-kernel/issues/46 > > > > > > > > > > Ok, so it looks like something was introduced that causes binder to be > > > too permissive (test 3 transaction succeeded when failure is > > > expected). I don't know of any recent binder changes that could have > > > caused that. > > > > > > It will take me a while to set up this test environment. Is this easy > > > for you to run? Any chance of bisecting or at least trying a few > > > versions to narrow it down? Here's a list of the recent patchset -- it > > > would be useful to know which caused it (or if none of them did): > > > > > > 9e98c678c2d6a Linux 5.1-rc1 > > > ... > > > 26528be6720bb binder: fix handling of misaligned binder object > > > bde4a19fc04f5 binder: use userspace pointer as base of buffer space > > > c41358a5f5217 binder: remove user_buffer_offset > > > db6b0b810bf94 binder: avoid kernel vm_area for buffer fixups > > > 7a67a39320dfb binder: add function to copy binder object from buffer > > > 8ced0c6231ead binder: add functions to copy to/from binder buffers > > > 1a7c3d9bb7a92 binder: create userspace-to-binder-buffer copy function > > > ... > > > 1c163f4c7b3f6 (tag: v5.0) Linux 5.0 > > > > > > Thanks, > > > > > > -Todd > > > > -- > paul moore > www.paul-moore.com