On Thu, July 13, 2006 9:50 am, William L. Maltby wrote: > On Wed, 2006-07-12 at 19:33 -0400, Paul wrote: >> OK, I'm still trying to solve this. Though the server has been up rock >> steady, but the errors concern me. I built this on a test box months >> ago >> and now that I am thinking, I may have built it originally on a drive of >> a >> different manufacturer, although about the same size (20g). This may >> have >> something to do with it. What is the easiest way to get these errors >> taken care of? I've tried e2fsck, and also ran fsck on Vol00. Looks >> like >> I made a fine mess of things. Is there I wasy to fix it without >> reloading > > AFAIK, there is no "easiest way". From my *limited* knowledge, you have > a couple different problems (maybe) and they are not identified. I'll > offer some guesses and suggestions, but without my own hard-headed > stubbornness in play, results are even more iffy. > >> Centos? Here are some outputs: >> >> >> snapshot from /var/log/messages: >> >> Jul 12 04:03:21 hostname kernel: hda: dma_intr: status=0x51 { DriveReady >> SeekComplete Error } >> Jul 12 04:03:21 hostname kernel: hda: dma_intr: error=0x84 { >> DriveStatusError BadCRC } >> Jul 12 04:03:21 hostname kernel: ide: failed opcode was: unknown > > I've experienced these regularly on a certain brand of older drive > (*really* older, probably not your situation). Maxtor IIRC. Anyway, the > problem occurred mostly on cold boot or when re-spinning the drive after > it slept. It apparently had a really *slow* spin up speed and timeout > would occur (not handled in the protocol I guess), IIRC. This is definitely a symptom. I wonder if LVM has anything to do with it? I'm running an "IBM-DTLA-307020" (20gig). I was previously running an "IBM-DTLA-307015" on FC1 on ext3 partitions and never had a problem. When I find the time, I am just going reload the Centos4.3 on ext3 partitions, restore data, and see how it goes. > > Your post doesn't mention if this might be related. If all your log > occurrences tend to indicate it happens only after long periods of > inactivity, or upon cold boot, it might not be an issue. But even there, > hdparm might have some help. Also, if it does seem to be only on cold- > boot or long periods of "sleeping", is it possible that a bunch of > things starting at the same time are taxing the power supply? Is the PS > "weak". Remember that PSs must have not only a maximum wattage > sufficient to support the maximum draw of all devices at the same time > (plus a margin for safety), but that also various 5/12 volt lines are > limited. Different PSs have different limits on those lines and often > they are not published on the PS label. Lots of 12 or 5 volt draws at > the same time (as happens in a non-sequenced start-up) might be > producing an unacceptable voltage or amperage drop. > > Is your PCI bus 33/66/100 MHz? Do you get messages on boot saying > "assume 33MHz.... use idebus=66"? I hear it's OK to have an idebus param > that is too fast, but it's a problem if your bus is faster than what the > kernel thinks it is. > > Re-check and make sure all cables are well-seated and that power is well > connected. Speaking of cables, is it new or "old"? Maybe cable has a > small intermittent break? Try replacing the cable. Try using an 80- > conductor (UDMA?) cable, if not using that already. If the problem is > only on cold boot, can you get a DC volt-meter on the power connector? > If so, look for the voltages to "sag". That might tell you that you are > taxing your PS. Or use the labels, do the math and calculate if your are > close to the max wattage in a worst-case scenario. > > I suggest using hdparm (*very* carefully) to see if the problem can be > replicated on demand. Take the drive into various reduced-power modes > and restart it and see if the problem is fairly consistent. > >> >> >> sfdisk -l: >> >> Disk /dev/hda: 39870 cylinders, 16 heads, 63 sectors/track >> Warning: The partition table looks like it was made >> for C/H/S=*/255/63 (instead of 39870/16/63). >> For this listing I'll assume that geometry. >> Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from >> 0 >> >> Device Boot Start End #cyls #blocks Id System >> /dev/hda1 * 0+ 12 13- 104391 83 Linux >> /dev/hda2 13 2500 2488 19984860 8e Linux LVM >> /dev/hda3 0 - 0 0 0 Empty >> /dev/hda4 0 - 0 0 0 Empty >> Warning: start=63 - this looks like a partition rather than >> the entire disk. Using fdisk on it is probably meaningless. >> [Use the --force option if you really want this] > > What does your BIOS show for this drive? It's likely here that the drive > was labeled (or copied from a drive that was labeled) in another > machine. The "key" for me is the "255" vs. "16". The only fix here (not > important to do it though) is to get the drive properly labeled for this > machine. B/u data, make sure BIOS is set correctly, fdisk (or sfdisk) it > to get partitions correct. > > WARNING! Although this can be done "live", use sfdisk -l -uS to get > starting sector numbers and make the partitions match. When you re-label > at "255", some of the calculated translations internal to the drivers(?) > might change (Do things *still* translate to CHS on modern drives? I'll > need to look into that some day. I bet not.). Also, the *desired* > starting and ending sectors of the partitions are likely to change. What > I'm saying is that the final partitioning will likely be "non-standard" > in layout and laying in wait to bite your butt. > > I would backup the data, change BIOS, sfdisk it (or fdisk or cfdisk, or > any other partitioner, your choice). If system is hot, sfdisk -R will > re-read the params and get them into the kernel. Then reload data (if > needed). If it's "hot", single user, or run level 1, mounted "ro", of > course. Careful reading of sfdisk can allow you to script and test (on > another drive) parts of this. I really want to try some of this, but not until I have a hot ready standby HD to throw in if it get's hosed. I'm hosting some stuff and like to known for reliable 24x7 service. > > Easy enough so far? >:-) Yea, peace of cake. Thanks for sharing your knowledge! I do need to play around with LVM more and get comfortable with it. LVM seems to be somewhere between Solaris metabd's and ZFS. > >> >> >> sfdisk -lf > > The "f" does you no good here, as you can see. It is really useful only > when trying to change disk label. What would be useful (maybe) to you is > "-uS". > >> >> <snip> > > HTH > -- > Bill > _______________________________________________ > CentOS mailing list > CentOS@xxxxxxxxxx > http://lists.centos.org/mailman/listinfo/centos > -- ^^^^^^^^^^^^| || \ | Budvar ######|| ||'|"\,__. | _..._...______ ===|=||_|__|...] "(@)'(@)""""**|(@)(@)*****(@)I _______________________________________________ CentOS mailing list CentOS@xxxxxxxxxx http://lists.centos.org/mailman/listinfo/centos