Re: NFS mount lockups since about a month ago

Roger Heflin <rogerheflin@xxxxxxxxx> · Fri, 1 Oct 2021 15:42:26 -0500

You need to replace mddevice with the name of your mddevice.

probably md0.

3-5 years is about when they start to go.  I have 2-3TB wd-reds
sitting on the floor because their correctable/offline uncorr kept
happening and blipping my storage (a few second pause).  I even
removed the disks from the raid and tried to force rewrite the blocks
and once a long self-test would run without errors I would put it
back.  But then put back in the array in a week or 2 it would start it
all over on the exact same disk/blocks/issues.   The disk firmware
does not seem to want to replace them with spare blocks so keeps using
the blocks that will never work, so the only option was buy new and
put them on the pile of useless old disks.

You can do a smartctl -t long <devicename> and that will take several
hours.   If it is not successful you have unreadable spots on the
disk.   I have all my disks doing those tests at least 1x per week,
and I keep in a <serialnumber> directory a dated file of  each days
smartctl report for that disk so I can compare them when something
starts acting up.  I have reports on the various disks going back to
2014.

Success looks like this in the smartctl --all device

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining
LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      4391         -
# 2  Extended offline    Completed without error       00%      4366         -
# 3  Extended offline    Completed without error       00%      4342         -
# 4  Extended offline    Completed without error       00%      4320         -
# 5  Extended offline    Completed without error       00%      4291         -
# 6  Extended offline    Completed without error       00%      4178         -
# 7  Extended offline    Completed without error       00%      4013         -
# 8  Extended offline    Completed without error       00%      3845         -

if there is an LBA where the - is it failed.

The script looks like this and runs 1x per day.
#!/bin/bash
stamp=`date +%Y%m%d-%H`
for disk in a b c d e f g h i j k l m n o p ; do
  smartctl --all /dev/sd${disk} > /var/log/smartctl/tmp.out
  serial=`grep Serial /var/log/smartctl/tmp.out  | awk '{print $3}'`
  mkdir -p /var/log/smartctl/${serial}
  mv /var/log/smartctl/tmp.out
/var/log/smartctl/${serial}/${serial}.${stamp}.sd${disk}.out
  if [ "${disk}" == "a" ] ; then
    smartctl -l ssd /dev/sd${disk} >>
/var/log/smartctl/${serial}/${serial}.${stamp}.sd${disk}.out
  fi
  if [ $? -eq 2 ] ; then
    rm -f /var/log/smartctl/sd${disk}.${stamp}.out
  fi
done

On Fri, Oct 1, 2021 at 2:20 PM Terry Barnaby <terry1@xxxxxxxxxxx> wrote:
>
> On 01/10/2021 19:05, Roger Heflin wrote:
> > it will show latency.  await is average iotime in ms, and %util is
> > calced based in await and iops/sec.  So long as your turn sar down to
> > 1 minute samples it should tell you which of the 2 disks had higher
> > await/util%.    With a 10 minute sample the 40sec pause may get spread
> > out across enough iops that you cannot see it.
> >
> > If one disk pauses that disks utilization will be significantly higher
> > than the other disk, and if utilization is much higher for the same or
> > less IOPS that is generally a bad sign.   2 similar disks with similar
> > iops will generally have similar util.    The math is close to (iops *
> > await / 10)(returns percent).
> >
> > Are you using MDraid or hardware raid?   doing a "grep mddevice
> > /var/log/messages will show if md forced a rewrite and/or had a slow.
> >
> > you can do this on those disks:
> >   smartctl -l scterc,20,20 /dev/<device>
> >
> > I believe 20 (2.0 seconds) is as low as a WD red lets you go according
> > to my tests.  If the disk hangs it will hang for 2 seconds vers the
> > current default (it should be 7 seconds, and really depends on how
> > many bad blocks there are together that try to be read).   Setting it
> > to 2 will make the overall timeout 3.5x smaller, so if that reduce the
> > hang time by about that that is a confirmation that it is a disk
> > issue.
> >
> > and do this on the disks:
> >   smartctl --all /dev/sdb | grep -E '(Reallocated|Current_Pen|Offline Uncor)'
> >
> >
> > if any of those 3 is nonzero in the last column, that may be the
> > issue.   The smart firmware will fail disks that are perfectly find,
> > and it will fail to fail horribly bad disks.    The PASS/FAIL
> > absolutely cannot be trusted no matter what is says.  FAIL is more
> > often right, but PASS  is often unreliable.
> >
> > So if nonzero note the number, and next pause look again and see if
> > the numbers changed.
> > _______________________________________________
>
> Thanks for the info, I am using MDraid. There are no "mddevice" messages
> in /var/log/messages and smartctl -a lists no errors on any of the
> disks. The disks are about 3 years old, I change them in servers between
> 3 and 4 years old.
>
> I will create a program to measure the effective sars output and detect
> any discrepancies as this problem only occurs now and then along with
> measuring iolatency on NFS accesses on the clients to see if I can track
> down if it is a server disk issue or an NFS issue. Thanks again for the
> info.
> _______________________________________________
> users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx
> To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx
> Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives: https://lists.fedoraproject.org/archives/list/users@xxxxxxxxxxxxxxxxxxxxxxx
> Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
_______________________________________________
users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/users@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure