Re: [Patch 4/4] chunkd: add self-checking

Jeff Garzik <jeff@xxxxxxxxxx> · Tue, 12 Jan 2010 09:21:24 -0500

On 12/27/2009 06:59 PM, Pete Zaitcev wrote:
This patch adds a background process that periodically verifies the
integrity of all known objects. Objects that are found found faulty
are made invisible to applications. This way, only known good objects
remain visible and applications may implement integrity checking that
is run asynchronously with the self-checking. The intent is to let
the performance of checking to scale with the number of objects and
amount of data stored in the cell.

Signed-off-by: Pete Zaitcev<zaitcev@xxxxxxxxxx>

---
  server/Makefile.am    |    2
  server/be-fs.c        |  126 +++++++++++++++-
  server/chunkd.h       |    8 +
  server/config.c       |   28 +++
  server/selfcheck.c    |  208 ++++++++++++++++++++++++++
  server/server.c       |    2
  test/.gitignore       |    1
  test/Makefile.am      |    4
  test/selfcheck-unit.c |  313 ++++++++++++++++++++++++++++++++++++++++
  test/server-test.cfg  |    2
  test/test.h           |    2
  11 files changed, 693 insertions(+), 3 deletions(-)

OK.  The objcache (patches 3 & 5) look good.  I was thinking that the 
"table of open files" was better suited for server/be-fs.c, thus keeping 
the backend / frontend separation a bit more distinct.  However, this is 
a matter of taste, and I can also see advantages in how you implemented 
it.  So, no changes necessary, we'll how that works out.

The reason why I think about backend / frontend separation, when we only 
have one storage backend, "the filesystem", is that I see a lot of merit 
in a second type of backend:  single-file (aka raw block device mode). 
I don't want to stray too far from being able to implement a second type 
of storage backend.  On the flip side, of course, I don't want imaginary 
future ideas to constraint us in the present.

On to this patch, self-checking.  Regarding this patch's implementation, 
the code appears to be correct.  However, I wanted to discuss the design 
a little bit.

I am definitely concerned that the current implementation does not 
really scale at all with the datastore.  With this change, the admin 
would have to guess at the self-check period, with not much feedback 
from the system about how long a scan takes, when a scan begins, or when 
a scan ends.  I also think a

	sleep(n)
	self_check()

algorithm seems less useful to the average admin than a slightly more 
complex one that solves the problem defined as "guarantee an object is 
checked at least every N days."  Because, as the dataset grows beyond a 
test database measures in megabytes, the dataset scan time will dwarf 
the self-check sleep period.  The behavior becomes one of constant 
scanning, with a tiny period of rest in between.

It seems like either (a) tracking the total dataset size and total 
dataset scan time, or (b) tracking and modifying per-object 
last-self-check times, would be needed.

Also of relevance to admins is scan start time.  This patch would have a 
scan begin at essentially random times throughout the day or night, in 
particular occurring during the most heavily-trafficked portion of the 
day.  An algorithm that occurs on hour N every day (or hour N, day M of 
each week) is much more predictable, regardless of dataset size.  That 
would be pretty easy:

	while (1)
		calc time until next desired interval
			(example: day 0, hour 5 (every sunday at 5am))
		timer_init()
		timer_add(scan_timer)

	scan_timer()
		if (scan thread exists)
			log ("still scanning")
			return
		start scan thread

	scan_thread()
		dbscan()
		end thread (do not loop)

Another option is to add an administrative command "START SCAN", and 
permit an external utility, scheduled by cron or executed by "make 
check", to be the entity that starts the scan thread in the background. 
 That would permit maximum flexibility for both the admin and our 
testsuite.

Comments on all this?  I will happily apply the objcache stuff once all 
the self-checking details are settled.

	Jeff

--
To unsubscribe from this list: send the line "unsubscribe hail-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html