Re: Operation not permitted / pthread_setschedparam

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 6 Oct 2014, Armin Steinhoff wrote:

> Sorry ... I got now time to read your test code and results in detail.
> Your test shows only the absence of failures in your "test" scenario.

Correct. So that proves that the kernel behaviour is correct, right?

Did you try to verify that test case on your system? Maybe you have,
but you did not report back, right?

Did you draw any conclusions from that experiment aside of "absence of
failures in my test scenario"? Maybe you have, but you did not report
back.

> My bug report provided in Bugzilla shows all informations about the facts.

Lemme look at the facts of https://bugzilla.kernel.org/show_bug.cgi?id=85501

Fact #1: 

  "A multithreaded user space application starts some threads with
   different RT priorities. After the first start of one of these RT
   threads, a bunch a other running processes jumping to the highest
   prio of 99!"

This is simply wrong. These processes are reniced to 99, which is the
lowest priority in the system. As I told you before: "man ps"

Fact #4 part 1: https://bugzilla.kernel.org/show_bug.cgi?id=85501#c3

  The output of the command "ps -elf" is incorrect or simply nonsense.

Lets do an experiment

# ps -elf | grep -m 1 acpid
1 S root      1438     1  0  80   0 -  1058 poll_s 18:44 ?        00:00:00 /usr/sbin/acpid

Now we renice all processes which belong to root by 20, i.e. we tell
the scheduler that these are the least favourable processes in the
system.

# renice -n 19 -u root
# ps -elf | grep -m 1 acpid
1 S root      1438     1  0  99  19 -  1058 poll_s 18:44 ?        00:00:00 /usr/sbin/acpid

And 'ps' tells us correctly, that priority changed from 80 to 99 and
nice value changed from 0 to 19. 80 + 19 = 99, right?

Lets switch it back to normal

# renice -n 0 -u root
# ps -elf | grep -m 1 acpid
1 S root      1438     1  0  80   0 -  1058 poll_s 18:44 ?        00:00:00 /usr/sbin/acpid

Surely 'ps' output is nonsense, right?

Fact #4 part 2: https://bugzilla.kernel.org/show_bug.cgi?id=85501#c3

  "ps -e -L -o class,rtprio,pri,nice,cmd" does'nt show priority
  changes of other unrelated processes ... but the values of pri and
  nice are swapped!

# ps -eo tid,comm,cls,pri,nice | grep -m 1 acpid
1438 acpid            TS  19   0

So the scheduling class is TS, prio is 19 and nice is 0

Now we renice all processes which belong to root by 20, i.e. we tell
the scheduler that these are the least favourable processes in the
system.

# renice -n 19 -u root
# ps -eo tid,comm,cls,pri,nice | grep -m 1 acpid
1438 acpid            TS   0  19

And that's what we expected. The scheduling class is still TS, prio is
0 and nice is 19.

Nonsensical as well according to your 'bug report'.

So while I agree that this is not intuitive, it is definitely not a
bug of ps. It's well documented behaviour and for the different ps
modes it is consistent behaviour.

If you feel confused by it, it's not a problem of the RT kernel at
all. If you think this is wrong, feel free to report a bug against
'ps', but don't continue to pester people who are not responsible for
this.

> I could upload my test app to the bug report ... so you get a chance to

Don't bother. I got it from source and you already spammed all list
members with a 400k+ tarball of it.

Now compiling the thing from source and running it, does not show any
of this behaviour.

So to put this to an end I could be bothered to run your version of
it. And funny enough it shows the problem.

The obvious conclusion is, that if the problem only happens with
the particular executable you compiled, it's neither a problem of the
kernel nor a problem of ps, right?

Surprising, isn't it?

Now if you look at your ps outputs you will notice, that only
processes which belong to user root are reniced to 99. If you can be
bothered to redo the above experiments you probably will notice the
same thing.

So the next obvious conclusion is, that this executable which you
provided issues a syscall which does the same thing as 'renice 19 -u
root', right?

So lets first look what renice does.

# strace renice -n 19 - u root

And looking at the output of syscalls there is the relevant one:

setpriority(PRIO_USER, 0, 19)           = 0

So if my not so reality distorted theory holds, we should see
something similar with your executable, right?

# strace ./demo_mn_console

And searching for setprority in the strace output shows:

setpriority(PRIO_USER, 0, 20)           = 0

Not exactly the same, but the nice value is clamped to 19, so the
kernel treats it the same way.

Now what issues that call? Obviously not ps.

As you did not provide any pointer to the source, I have no idea what
kind of bug is there. 

Just for completeness sake, I looked at the public available source
and put a breakpoint on initSystem(). At the point where the
breakpoint was reached, no problem. Single stepping over the next
function showed the problem with your executable, but not with the one
compiled from the pulic source tree.

The public available source calls 

    nice(-20)

at this point, which is definitely not going to call 

   setpriority(PRIO_USER, 0, 20)           = 0

simply because nice(-20) is issuing the syscall with

   setpriority(PRIO_PROCESS, 0, -20)       = 0

according to the strace output of the version I compiled.

Now looking at the disassembly of your executable tells me:

000000000040b954 <initSystem>:
  40b954:       55                      push   %rbp
  40b955:       48 89 e5                mov    %rsp,%rbp
  40b958:       53                      push   %rbx
  40b959:       48 81 ec a8 02 00 00    sub    $0x2a8,%rsp
  40b960:       ba 14 00 00 00          mov    $0x14,%edx
  40b965:       be 00 00 00 00          mov    $0x0,%esi
  40b96a:       bf 02 00 00 00          mov    $0x2,%edi
  40b96f:       e8 2c e2 ff ff          callq  409ba0 <setpriority@plt>

that it is calling setpriority() with the arguments

     which = 2 	   (PRIO_USER)
     who   = 0     (root)
     prio  = 0x14  (20)

And that's exactly what is causing your problem.

While the executable I compiled from the public available source gives
me the following disassembly:

0000000000409f70 <initSystem>:
  409f70:       48 81 ec 38 01 00 00    sub    $0x138,%rsp
  409f77:       bf ec ff ff ff          mov    $0xffffffec,%edi
  409f7c:       e8 8f ea ff ff          callq  408a10 <nice@plt>

So that calls nice() with the argument

     prio = 0xffffffec	  (-20)

Which is what you expect.

So the "facts" of your bug report are correct in the observation that:

   Starting your application is causing unexpected behaviour and the
   output of 'ps -elf' proves that issue.

And that's where the facts end.

> replicate these problems and get out of your "realty distortion".

Sure, my reality distortion is:

 - That I knew as everybody else on this thread that the problem is
   inside your application.

   It was obvious from your own observation:

   >> If the app "demo_mn_console" has started its first RT thread,  a lot
   >> of other processes/threads are jumping to the highest RT priority
   >> 99!!

 - That I did not follow your reasoning that:

   >> we have a problem with the RT kernel ? Or is simply the ps
   >> utility broken ?

 - That I asked you politely to follow the bug reporting procedures,
   which you refused.

   If you look at the above disassembly sections, you should be aware
   why I can prove that.

 - That Carsten gave you a detailed argument chain, which pointed
   obviously to the application itself. But you completely ignored it:

   "> 7. Your first suspicion is that we may have a problem with the RT kernel.
    > 8. And your second suspicion is that the ps utility is broken.
    > Doesn't come another suspicion to mind?

    Why should I have one? The same code works correctly under then
    non-RT version of the kernel ... at least from the Linux point of
    view."

 - That I told you how to interpret the PRI field of the ps -elf
   output and you simply refused to even think about it. 

   Instead of that you file bug reports against 'ps' on the kernel
   bugzilla. Can you see why the kernel bugzilla is the wrong place to
   file bug reports against 'ps' ?

   And then you told me clearly:

   "Sorry for not bothering about the confusing handling of priorities
    within LINUX ..."

   So despite the fact, that I asked you to make your self familiar
   with 'ps', you insist on 'ps' being broken and file a bug report
   against the RT kernel component?

 - That I know how 'ps' works

   I told you to study 'man ps'. That should have been enough hint to
   decode the issue.

 - That I knew that neither the v3.4.0-rc7-rt7 nor the v3.4.121-rt97
   kernel version was responsible for your problems?

   Sure. That must be a heavy reality distortion field which caused me
   to believe that I can figure that out w/o looking at your claims.

 - That I figured out without having access to the source code of your
   executable, that you are either running a modified version of the
   open powerlink code or your build setup has a massive failure.

   I don't care either way, but you might understand that neither of
   these problems are relevant for this particular mailinglist.
 
> Asking question for a confusing failure situation has nothing to with
> "abusing the community!

Asking about it definitely not, but refusing to follow any advise and
refusing to follow the hints people give you defintely counts for it.

> Yes, I got 3 good hints ... no statement about the arrogant "hint" from you.
> It's quite normal to "have theories about a failures" as long as I
> verify these in order to find the real problem.

Sure, you can have your own view of "having theories about failures",
but a community mailing list does not necessarily have to share that
view. We care about facts and not about random theories caused by
whatever failure modes.

> Are you trying to kidding me with this statement: "Why should we be
> bothered to solve your business problems,"
> 
> Do you really believe I have a business problem ?  There is simpy
> absolute no business with PREEMPT_RT!  I do only some strategic work
> for a potential usage of PREEMP_RT ... hope this solves your
> problem.

I have no problem with that. Just your website offerings tell a
different story.

 http://bit.ly/1pIsqca

The topmost link says:

    Steinhoff Automations & Feldbus-Systeme
    www.steinhoff.de/
    
    FIELDBUS PAGE for OPEN CONTROL system DACHS from STEINHOFF,
    ... all DACHS products are offered for QNX, and Standard &
    PREEMPT_RT Linux !

If I'm not completely mistaken, that's your website.

So you are offering commercial solutions based on PREEMPT_RT. Whether
that offerings generate a business for you or not is completely
irrelevant. Whether you are doing only strategic work for your already
existing offerings or not is equally irrelevant. Thats merily your
problem.

I really do not care about your business at all. What I care about is
your abusive behaviour that a community mailing list has to bear
with.

> I'm working since 25 years wit QNX and other classical real-time
> operation systems. I did also operating system development for many
> years ... but before the first LINUX version was released. So be
> careful to tell me something about "incompetence" ...

I'm really not in a position to judge your competence. I merely care
about facts.

- Fact is that you ignored any advise and any hints given to you and
  insist that the community has to bear your 'quite normal theories'.

- Fact is that you seek community advise, but then you state:

  "Sorry for not bothering about the confusing handling of priorities
   within LINUX ..."

   So we are supposed to help you while you refuse to understand how it
   works?

- Fact is that you said:

  >> Hint #2: linux/REPORTING-BUGS
  > OK  ... still TODO 

  AFAICT this is still on your todo list, right?

- Fact is that you said in response to Carsten:

  Carsten: "5. At a given time, you start your application that apparently modifies
 	    task priorities *in some way*."

  You:     "Not in some way ... the developers of the openPOWERLINK stack are using
 	    plain POSIX calls."

  Nothing wrong with that, just I cannot figure out why the usage of
  plain posix calls has anything to do with the wrong usage of posix
  calls? And as I showed above this is a case of plain wrong usage of
  posix calls.

- Fact is that the executable you provided issues a syscall which
  causes a siutation which you claimed to be either a problem of the
  kernel or a problem of the ps utility.

- Fact is that I decoded the issue without having access to the source
  code within a few minutes.

- Fact is that you failed to decode the issue despite of having access
  to the source code of the application.

- Fact is that the community provided source code does not have this
  issue.

- Fact is that is was obvious that "problem starts after doing X" has
  to be related with X. 

  You have have been told so, but you deliberately decided to to
  ignore that advise.

- Fact is that you claim publically that I'm suffering of reality
  distortion.

  See above and then make further claims at your leisure.

Thanks,

	tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [RT Stable]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux