Re: Process Scheduling Issue using sg/libata

"Fajun Chen" <fajunchen@xxxxxxxxx> · Sat, 17 Nov 2007 12:37:36 -0700



On 11/17/07, James Chapman <jchapman@xxxxxxxxxxx> wrote:
> Fajun Chen wrote:
> > On 11/16/07, Tejun Heo <htejun@xxxxxxxxx> wrote:
> >> Fajun Chen wrote:
> >>> I use sg/libata and ata pass through for read/writes. Linux 2.6.18-rc2
> >>> and libata version 2.00 are loaded on ARM XScale board.  Under heavy
> >>> cpu load (e.g. when blocks per transfer/sector count is set to 1),
> >>> I've observed that the test application can suck cpu away for long
> >>> time (more than 20 seconds) and other processes including high
> >>> priority shell can not get the time slice to run.  What's interesting
> >>> is that if the application is under heavy IO load (e.g. when blocks
> >>> per transfer/sector count is set to 256),  the problem goes away. I
> >>> also tested with open source code sg_utils and got the same result, so
> >>> this is not a problem specific to my user-space application.
> >>>
> >>> Since user preemption is checked when the kernel is about to return to
> >>> user-space from a system call,  process scheduler should be invoked
> >>> after each system call. Something seems to be broken here.  I found a
> >>> similar issue below:
> >>> http://marc.info/?l=linux-arm-kernel&m=103121214521819&w=2
> >>> But that turns out to be an issue with MTD/JFFS2 drivers, which are
> >>> not used in my system.
> >>>
> >>> Has anyone experienced similar issues with sg/libata? Any information
> >>> would be greatly appreciated.
> >> That's one weird story.  Does kernel say anything during that 20 seconds?
> >>
> > No. Nothing in kernel log.
> >
> > Fajun
>
> Have you considered using oprofile to find out what the CPU is doing
> during the 20 seconds?
>
Haven't tried oprofile yet, not sure if it will get the time slice to
run though. During this 20 seconds, I've verified that my application
is still busy with R/W ops.

> Does the problem occur when you put it under load using another method?
> What are the ATA and network drivers here? I've seen some awful
> out-of-tree device drivers hog the CPU with busy-waits and other crap.
> Oprofile results should show the culprit.
If blocks per transfer/sector count is set to 256, which means cpu has
less load (any other implications?), this problem no longer occurs.
Our target system uses libata sil24/pata680 drivers, has a customized
FIFO driver but no network driver. The relevant variable here is
blocks per transfer/sector count, which seems to matter only to
sg/libata.

Thanks,
Fajun
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html