Re: [linux-2.6.26.8-rt14] RT Page Fault.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

Thank you for this extremely fast response. :-)

I haven't try to run my application from RAM disk (as suppose you suggest building initramrd or initramfs). It's inconvenient, for development, since size of such init ram disk is limited. I will try it anyway.

Now some explanation about uClibc. I know that this C library lacks of many features (for instance NTPL support for ARM). Unfortunately, for ARM architecture, there is nice utility -> buildroot for building toolchain and root filesystems. I'm not using x86 as target architecture, so cross compiling toolchain without errors is success as it is :-).

I've tried pxtdist tool from pengutronix.de , but I cannot build proper toolchain without errors for my Atmel's ARM9 device (ARMv5tej arch to be precise). Or saying it in another way -> I gave up after spending some time and moved back toward working uClibc. I've build some toolchains, but during build I've encountered some errors and despite of that it, built arm-linux-gcc which worked, I wasn't sure if glibc is not cracked in some way.

On my first post I haven't mention that I was checking rusage struct for minor and major page faults. In my program I checked it twice: first before calling mlockall() - on the beginning of the program and after munlockall() when program ends either when delay occurs or in normal termination. In both cases I've got the same number of minor and major page faults. So it looks that no page fault appears.


I've checked, and it turns out that uClibc 0.9.30 is not supporting PI (priority inversion) and futexes. This may cause my bug ,since I'm heavily using read/write on /dev/ files (read/write to 4 differen files). There's a lot of space for priority inversion in my code :-/. I was writing this code with holly faith that "in some way" priority inversion is avoided. I assumed, that access to some device (resource in this case) has some mechanism to inheritance the priority.

Is looks,that the only solution for me is to build toolchain with glibc at least 2.6 and gcc 4.+ with full support of Priority Inheritance, futexses and "clock_" set of functions. As I mentioned it's a bit tricky for ARM (especialy with proper ABI and software floating point support (msoft-float)). But it looks as the only feasible solution.

Thanks for advice.

Regards,
Lukasz



Remy Bohmer wrote:
Hello Lukasz,

I'm also using NFS to mount root file system from my host x86 ubuntu PC.

Have you tried already running from a ram-disk?

When I start my application it runs for some time and ends as expected. It
seems that everything is OK. Static schedule is not violated. Unfortunately,
after running this application for couple of times (6 to 10) I can see that
static schedule is violated(delayed in execution) for about 2-4 seconds.
Application is running for 1-2 seconds as expected and then crashes(I mean
exits with static schedule delay of 2-4 seconds). It looks like page fault,

You can trace the number of page-faults during run, by means of
getrusage(), see rt-wiki.
2-4 seconds sounds quite long for page fault handling to me (unless
you are using page/swap files)

but in my main() I've add mlockall() as writen in the examples from rt.wiki.
Moreover I've prevent stack as written  in "square_wave example". Before my
application exits I'm calling munlockall(). When I log via ssh to my
embedded system and start top,I cannot see that I've got some memory leaks
or zombi processes during run of my RT application.

May it be possible that by some chance some global variable is not locked in
the memory? What is the "scope" of mlockall? Is it only valid in one .o

mlockall() is somewhat tricky. It locks all allocated data pages (and
future pages, if specified) in to RAM, but IIRC code segments are not
forced to be loaded into RAM, but only code segments that are loaded
once, will be locked. So, in theory, there could be pages still on the
NFS share that are not loaded when the problem arises. So, this could
be the problem you see, but it would not be the first suspect I would
look for.

I'd appreciate any hints/comments what can cause this bug.

I read you use uClibc, the last time I looked at it (quite some time
ago), it lacked support for priority inheritance mutexes... Aren't you
running in a mutex priority inversion?

Or priority inversion related to other interrupt threads? You run at
prio 71, if you leave the network, or block device
softirqs/irq-threads on 50, you could have a priority inversion on
this level as well. This would be my prime suspect...

I was trying to
use strace and gdb to fix this problem, but this tools are to slow and they
cause violation of my cyclic static schedule.

No ETM trace available? Really nice to have in such cases...


Kind Regards,

Remy


--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [RT Stable]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux