Re: Vanilla 3.0.78

Dave Chinner <david@xxxxxxxxxxxxx> · Mon, 29 Jul 2013 20:01:34 +1000

On Mon, Jul 29, 2013 at 10:31:52AM +0200, Stefan Priebe - Profihost AG wrote:
> Am 29.07.2013 10:22, schrieb Dave Chinner:
> > On Mon, Jul 29, 2013 at 09:39:37AM +0200, Stefan Priebe - Profihost AG wrote:
> >> Hi,
> >>
> >> while running 3.0.78 and doing heavy rsync tasks on a raid 50 i'm gettig
> >> these call traces:
> > 
> > Judging by the timestamps the  problem clears and the system keeps
> > running?
> 
> Yes.
> 
> > If so, the problem is likely to be a combination of contention on a
> > specific AG for allocation and slow IO. Given it is RAID 50, it's
> > probably really slow IO, and probably lots of threads wanting the
> > lock and queuing up on it.
> > 
> > What's 'iostat -m -x -d 5' look like when these messages are dumped
> > out?
> 
> Don't have that but some nagios stats. There were 1000 iop/s and 8MB/s.

Yup, that sounds like it was doing lots of small random IOs and
hence was IO bound...

> But i can reduce the tasks done in parallel if this is the problem.

Try and find out what the average IO times were when the messages
are being emitted. If that's up in the seconds, then it's a good
chance you are simply throwing too many small IOs at your storage.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs