regarding crazy head unloads

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello, ATA/SMART fellows.

This message is regarding crazy head unloads on certain laptops. In a desperate attempt to increase battery time, some vendors configure ATA APM (advanced power management) too aggressive to the point of being fragile (can even be triggered on Windows) and the drive unloads head like crazy and kills itself quickly (in months). For more information, please take a look at the following links.

https://bugzilla.novell.com/show_bug.cgi?id=386555
https://bugs.launchpad.net/ubuntu/+source/acpi-support/+bug/59695
http://www.thinkwiki.org/wiki/Problem_with_hard_drive_clicking

This primarily is those hardware vendors' faults and updating their firmware is probably the best way to fix it; however, it can actually kill the harddrive which usually causes a lot of anxiety and stress on the user, so I think we need to take some measures.

Attached are storage-fixup script which is to be called during boot and resume and configuration file to go under /etc. The script can match dmi and hal properties and execute commands on the matching devices. The config file currently only contains three rules.

Here are two ideas to better handle this problem:

1. Describe the problem on linux-ata.org and ask people to report dmidecode and hdparm -I output on affected machines. Share storage-fixup (or any other alternative) and storage-fixup.conf on the page.

2. This is from Roland. Make smartd aware of the problem and warn user if load/unload count per powered on hours goes too high. Maybe the warning can direct the user to linux-ata.org page?

Thanks.

--
tejun
#! /bin/bash
#
# storage-fixup			- Tejun Heo <teheo@xxxxxxx>
#
# Script to issue fix up commands for weird disks.  This is primarily
# to adjust ATA APM setting.  Some laptop BIOSen set this value too
# aggressively causing frequent head unloads which can kill the drive
# quickly.  This script should be called during boot and resume.  It
# examines rules from /etc/stroage-fixup.conf and executes matching
# commands.
#
# In stroage-fixup.conf, empty lines and lines starting w/ # are
# ignored.  Each line starts with rule, dmi, hal or act.
#
# rule RULENAME
#	Starts a rule.  $RULENAME can't contain whitespaces.
#
# dmi KEY VALUE
#	Checks whether DMI value for KEY matches VALUE.  If not, the
#	rule is skipped.
#
# hal KEY VALUE
#	Checks whether there are devices which has KEY value matching
#	VALUE.  storage-fixup determines applies actions to devices
#	which match all hal matches, so all rules should have at least
#	one hal match.
#
# act ACTION
#	Executes ACTION on matched devices.  ACTION can contain $DEV
#	which will be substituted with device file of matching device.
#
# For example, the following (useless) rule disables APM on the first
# harddrive of my machine.
#
# rule p5w64
# dmi baseboard-product-name	P5W64 WS Pro
# dmi baseboard-manufacturer	ASUSTeK Computer INC.
# hal storage.model		WDC WD5000YS-01M
# hal storage.serial		SATA_WDC_WD5000YS-01_WD-WMANU1217262
# act hdparm -B 255 $DEV
#

conf_file=${CONF_FILE:-/etc/storage-fixup.conf}
hal_find_by_property=${HAL_FIND_BY_PROPERTY:-hal-find-by-property}
hal_get_property=${HAL_GET_PROPERTY:-hal-get-property}
dmidecode=${DMIDECODE:-dmidecode}

verbose=0
lineno=0
skip=0
rule_name=""
declare -a dev_ids
newline=$'\n'

log() {
    echo "storage-fixup: $@"
}

warn() {
   log "$@" 1>&2
}

debug() {
    if [ $verbose -ne 0 ]; then
	warn "$@"
    fi
}

#
# Match functions - do_dmi() and do_hal() - execute DMI and HAL
# matches respectively.  Return value 0 indicates match, 1 invalid
# match (triggers warning) and 2 mismatch.
#
do_dmi() {
    local val

    if [ -z "$1" -o -z "$2" ]; then
	return 1
    fi

    val=$($dmidecode --string "$1")
    if [ "$?" -ne 0 ]; then
	return 1
    fi

    if [ "$val" = "$2" ]; then
	debug "Y $lineno $rule_name dmi $1=$2"
	return 0;
    fi

    debug "N $lineno $rule_name dmi $1=$2"
    return 2
}

do_hal() {
    local i out ifs_store append=0

    if [ -z "$1" -o -z "$2" ]; then
	return 1
    fi

    if [ ${#dev_ids[@]} -eq 0 ]; then
	append=1
    fi

    #
    # bash really isn't a good programming language for this kind of
    # stuff and makes it look much more complex than it needs to be.
    # The following loop executes hal-find-by-property and ands the
    # result with the previous result.
    #
    ifs_store="$IFS"
    IFS="$newline"
    dev_ids=(
	$($hal_find_by_property --key "$1" --string "$2" \
	    | while read found; do
		if [ $append -ne 0 ]; then
		    echo "$found"
		else
		    for id in "${dev_ids[@]}"; do
			if [ "$id" = "$found" ]; then
			    echo "$found"
			    break
			fi
		    done
		fi
	    done))
    IFS="$ifs_store"

    if [ "$?" -ne 0 ]; then
	debug "N $lineno $rule_name hal $1=$2"
	return 2
    fi

    debug "Y $lineno $rule_name hal nr_devs=${#dev_ids[@]} $1=$2"

    return 0
}

do_act() {
    local id dev

    for id in "${dev_ids[@]}"; do
	if ! DEV=$($hal_get_property --udi "$id" --key block.device); then
	    warn "can't find device node for $id"
	    continue
	fi

	eval log "$rule_name: executing \"$1\""
	eval "$1"
    done

    return 0
}

while read f0 f1 f2; do
    true $((lineno++))
    if [ -z ${f0###*} ]; then
	continue
    fi

    if [ "$f0" = rule ]; then
	rule_name=$f1
	skip=0
	dev_ids=()
	continue
    fi

    if [ $skip -ne 0 ]; then
	continue
    fi

    case "$f0" in
    dmi)
	    do_dmi "$f1" "$f2"
	    ;;
    hal)
	    do_hal "$f1" "$f2"
	    ;;
    act)
	    do_act "$f1 $f2"
	    ;;
    *)
	    false
	    ;;
    esac

    ret=$?
    if [ $ret -ne 0 ]; then
	if [ $ret -eq 1 ]; then
	    warn "malformed line $lineno \"$f0 $f1 $f2\","\
	         "skipping rule $rule_name" 2>&1
	fi
	skip=1
    fi
done < $conf_file
rule tp-t60
dmi system-manufacturer		LENOVO
dmi system-product-name		1952W5R
dmi system-version		ThinkPad T60
hal storage.model		Hitachi HTS722020K9SA00
act hdparm -B 255 $DEV

rule hp-dv6500
dmi system-manufacturer		Hewlett-Packard
dmi system-product-name		HP Pavilion dv6500 Notebook PC
dmi system-version		Rev 1
hal storage.model		SAMSUNG HM250JI
act hdparm -B 255 $DEV

rule dell-e1505
dmi system-manufacturer		Dell Inc.
dmi system-product-name		MM061
hal storage.model		ST9100824AS
act hdparm -B 255 $DEV

[Index of Archives]     [Linux Filesystems]     [Linux SCSI]     [Linux RAID]     [Git]     [Kernel Newbies]     [Linux Newbie]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Samba]     [Device Mapper]

  Powered by Linux