On Thu, Jan 08, 2009 at 08:57:18PM -0800, Robert Banz wrote: > > On Jan 8, 2009, at 4:46 PM, Bron Gondwana wrote: > >> On Thu, Jan 08, 2009 at 08:01:04AM -0800, Vincent Fox wrote: >>> (Summary of filesystem discussion) >>> >>> You left out ZFS. >>> >>> Sometimes Linux admins remind me of Windows admins. >>> >>> I have adminned a half-dozen UNIX variants professionally but >>> keep running into admins who only do ONE and for whom every >>> problem is solved with "how can I do this with one OS only?" There's a significant upfront cost to learning a whole new system for one killer feature, especially if it comes along with signifiant regressions in lots of other features (like a non-sucky userland out of the box). Applying patches on Solaris seems to be a choice between incredibly low-level command line tools or boot up a whole graphical environment on a machine in a datacentre on the other side of the world. >> We run one zfs machine. I've seen it report issues on a scrub >> only to not have them on the second scrub. While it looks shiny >> and great, it's also relatively new. > > You'd be surprised how unreliable disks and the transport between the > disk and host can be. This isn't a ZFS problem, but a statistical > certainty as we're pushing a large amount of bits down the wire. > > You can, with a large enough corpus, have on-disk data corruption, or > data corruption that appeared en-flight to the disk, or in the > controller, that your standard disk CRCs can't correct for. As we keep > pushing the limits, data integrity checking at the filesystem layer -- > before the information is presented for your application to consume -- > has basically become a requirement. > > BTW, the reason that the first scrub saw the error, and the second scrub > didn't, is that the first scrub fixed it -- that's the job of a ZFS # zpool status -v rpool pool: rpool state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: scrub in progress for 0h0m, 0.69% done, 1h40m to go config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror ONLINE 0 0 0 c5t0d0s0 ONLINE 0 0 0 c5t4d0s0 ONLINE 0 0 0 errors: Permanent errors have been detected in the following files: //dev/dsk ------- if that's an "error that the scrub fixed" then it's a really badly written error message. Same error didn't exist next scrub, which was what confused me. Bron. ---- Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html