On 12/20/2009 10:34 PM, Tejun Heo wrote:
(cc'ing Kay and Lennart. Hello.) This thread was discussing about drives which unload heads too frequently. These problems happen mostly on laptops. Either mobile HDDs default to too aggressive power saving or laptop firmware configures them that way. Anyways, some drives end up unoading and reloading the head more quite a few times per minute. Mobile drives tend to have higher load cycle limits than desktop ones and this information can be found from drive specs published on vendor websites. Most modern mobile ones seem to be rated for 600,000 cycles. Unfortunately, with 5 unloads per minute, the drive will reach its rated limit only after 83 days of uptime. IOW, if you use the machine 8hrs per day, it will expire before one year has passed. Very short unload timeout is inherently dangerous as idle IO patterns can differ depending on a lot of things and these rapid load/unload cycles can happen under various different configurations (it happens under windows too). When this problem first appeared, I thought vendors would realize the danger and it would go away sooner or later. Expecting it to be a temporary problem, I wrote up a simple script named storage-fixup which matches the system and harddrive model and issues safe powersave configuration. This is a crude and sub-optimal solution which doesn't scale too well. Many of those configurations wouldn't require such APM adjustments and a lot of configurations where APM re-configuration is required are out there killing their drives. A proper solution would be.... * Build database of load cycle limits and useable APM values on drive models. The former shouldn't be difficult. Each vendor carries only a few product lines at any given time and publish datasheets on the webpage. Plus, all the mobile drives I've seen are rated for 600,000 cycles. The latter may be a bit more tricky. Depending on drive model, certain APM values simply don't work (e.g. 255 means max power by spec but some firmwares wrap the value and recognize it as min power), some values overheats the device and so on. In most cases the value 254 seems safe tho. storage-fixup.conf should be useable as the source for useable values, I think. * Monitor load cycle count by smart commands and if it continues to increase at an excessive rate (e.g. such that it reduces uptime to under a year), warn the user and configure higher APM value. As this problem mostly happens on laptops, I think it's probably best to handle this from the new desktop disk management thing so that the user can be warned. Do you think it's feasible to handle this from devkit?
I think that would be a good approach if we can do it. The situation definitely isn't ideal though. Has anyone approached any of the laptop manufacturers or drive manufacturers regarding this problem? I suspect there's probably a lack of awareness about it. (Though it could just be that Windows usually accesses the drive so often that it just never really reaches the unload timeouts..)
-- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html