On Tue, Mar 19, 2019 at 9:08 PM Todd Kjos <tkjos@xxxxxxxxxx> wrote: > Paul, > > Looking at a snippet of the test output: > > Service Provider read_consumed: 8 > Service Provider command: BR_NOOP > Service Provider command: BR_FAILED_REPLY <-------------- the txn > failed as expected. > Manager read_consumed: 8 > Manager command: BR_NOOP > Manager command: BR_TRANSACTION_COMPLETE > not ok 3 > <--------- but for some reason didn't exit(103) > # Failed test at ./test line 56. > > It looks like the test script expects test_binder to fail with > exit(103) after processing the Server Provider commands. It's not > clear why it didn't, since the return of a BR_FAILED_REPLY for that > transaction should have executed this code (line 392 of > test_binder.c): > > if (cmd == BR_FAILED_REPLY || > cmd == BR_DEAD_REPLY || > cmd == BR_DEAD_BINDER) { > fprintf(stderr, > "Failed response from Service Provider using Managers FD\n"); > exit(103); > } > > Could this be an issue with the test? At least it doesn't look like a > transaction is succeeding when it shouldn't. Hi Todd, Thanks for looking into this further. Look a bit more at the test, it appears that the code above (line 392) only comes into play if the service provider is handling a BR_REPLY, but looking at the test output it doesn't appear that this is the case. # runcon -t test_binder_provider_no_im_t ./test_binder -v -r 2 provider Service Provider PID: 2095 Process context: unconfined_u:unconfined_r:test_binder_provider_no_im_t:s0-s0:c0.c1023 Service Provider sending transaction to Manager - ADD_TEST_SERVICE Service Provider read_consumed: 48 Service Provider command: BR_NOOP Service Provider command: BR_INCREFS Service Provider command: BR_ACQUIRE Service Provider command: BR_TRANSACTION_COMPLETE Service Provider read_consumed: 8 Service Provider command: BR_NOOP Service Provider command: BR_FAILED_REPLY However, things get weird. In the course of writing this email I booted into my 5.0.0-1.1.secnext kernel (which passed the binder test earlier) and now that kernel is failing in the same way (the test hasn't changed). This test system is a Fedora Rawhide system which is updated before I make a test run and I'm wondering if there is some other userspace component which may be affecting this ... ? I've now tried this on two different, current Rawhide VMs, hosted on two different systems and I'm seeing the same thing, so I don't think it's a *bad* system/VM? > On Tue, Mar 19, 2019 at 5:15 PM Todd Kjos <tkjos@xxxxxxxxxx> wrote: > > > > [...] > > > > > > Is there a public dashboard where I can take a look at those binder failures? > > > > > > Not really. I send test results to a not-yet-publicized mailing list, > > > but there is more detail in the GitHub issue below (my last comment > > > has the verbose test output): > > > > > > * https://github.com/SELinuxProject/selinux-kernel/issues/46 > > > > > > > Ok, so it looks like something was introduced that causes binder to be > > too permissive (test 3 transaction succeeded when failure is > > expected). I don't know of any recent binder changes that could have > > caused that. > > > > It will take me a while to set up this test environment. Is this easy > > for you to run? Any chance of bisecting or at least trying a few > > versions to narrow it down? Here's a list of the recent patchset -- it > > would be useful to know which caused it (or if none of them did): > > > > 9e98c678c2d6a Linux 5.1-rc1 > > ... > > 26528be6720bb binder: fix handling of misaligned binder object > > bde4a19fc04f5 binder: use userspace pointer as base of buffer space > > c41358a5f5217 binder: remove user_buffer_offset > > db6b0b810bf94 binder: avoid kernel vm_area for buffer fixups > > 7a67a39320dfb binder: add function to copy binder object from buffer > > 8ced0c6231ead binder: add functions to copy to/from binder buffers > > 1a7c3d9bb7a92 binder: create userspace-to-binder-buffer copy function > > ... > > 1c163f4c7b3f6 (tag: v5.0) Linux 5.0 > > > > Thanks, > > > > -Todd -- paul moore www.paul-moore.com