Re: Design challenges in chunkd self-checking

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/22/2009 04:41 PM, Pete Zaitcev wrote:
I'm looking into adding self-checking to chunkd. This involves basically
a process that re-reads everything stored in the chunkserver and verifies
that it's still ok. Nothing can be simpler, right?

So, current problems for which I'd like input are:

  - Scheduling and deconflicting with normal operation.

    Run "genisofs" in your Fedora desktop and your Firefox is DEAD.
    It is also the reason why everyone does rpm -e mlocate the first thing
    after the installation. The effect of massive data access blowing
    away caches is very drastic in a regular Linux.
    So, I have to have a good way to keep self-checkig from interfering
    with normal service of a chunkserver.
    Also, need to save power instead of burning it on re-reading data.

The problem seems to revolve around two variables:

* last-checked time. You wouldn't want to check a single individual object more than once every N hours|days|weeks.

* maximum bytes-per-second. You wouldn't want to exceed a useful bound for throughput.

Perhaps the last variable could be calculated by observing disk throughput over time, in conjunction with the number of objects and their sizes, resulting in an idea of the total time required to check the entire dataset.

And if we start keeping data like this, we might want to move metadata from the beginning of each object to a TC database. That might speed up fs_list_objs and a couple other operations, too.


  - Consistency.

    Returning wrong checksums for an object that is being updated may
    lead to us deciding to drop a perfectly good object, which is
    unacceptable (especially when redundancy is impaired already).
    So, I need some kind of locking, or logging, or invalidation...

It is normal and reasonable to maintain global information about all in-progress operations. Caching systems do that, for example, to ensure multiple cache requests for object A do not initiate multiple simultaneous back-end requests for object A.

For the purposes of verification, I would just skip objects that are actively being written-to. Those are, by definition, too new to probably need verification anyway.

BTW, in case this is helpful, chunkd's backend writes a zeroed metadata header to the beginning of each object. The metadata header is only updated with "real" values after the final data byte is written.

	Jeff


--
To unsubscribe from this list: send the line "unsubscribe hail-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Fedora Clound]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux