Re: GlusterFS projects page for GSOC and the likes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Apr 24, 2013 at 11:25 AM, Shishir Gowda <sgowda@xxxxxxxxxx> wrote:
> Hi Xavi,
>
> I would be interested in knowing what the gfsck tool would try to accomplish.
>
> I have certain scenarios from distribute xlator, which would be ideal candidates to be handled in fsck.
>
> Please let me know the scope of gfsck, so that I could share the ideas with you.
>
> With regards,
> Shishir
>
> ----- Original Message -----
> From: "Xavier Hernandez" <xhernandez@xxxxxxxxxx>
> To: "Krishnan Parthasarathi" <kparthas@xxxxxxxxxx>
> Cc: gluster-devel@xxxxxxxxxx
> Sent: Monday, April 22, 2013 7:20:36 PM
> Subject: Re: GlusterFS projects page for GSOC and the likes
>
> I've just added 'gfsck', a tool to check file system integrity and
> repair any detected error.
>
> I'm already working on it.
>
> Xavi
>
> Al 22/04/13 15:03, En/na Krishnan Parthasarathi ha escrit:
>> Hi All,
>>
>> I am trying to collect all GlusterFS project ideas into a single page
>> in the wiki, here:
>> http://www.gluster.org/community/documentation/index.php/Projects
>>
>> I have added the first entry. It is about building a diagnostic tool
>> like nfsiostat for GlusterFS mounts. I volunteer to mentor anyone
>> interested in this.
>>
>> I hope to see more entries and volunteers :-)
>>
>> cheers,
>> krish
>>

I have added some my thoughts here.

Why did we not implement 'gfsck' in the first place?
Traditional 'fsck' approach is not scalable. It may take from days to
months to complete one full check. It requires filesystem to be in
offline mode (unmounted). Every n'th system boot (mount) requires a
full check (GlusterFS is mounted and running all the time). Errors can
quickly accumulate in this window. Healing and reliability cannot be
an afterthought.

GlusterFS self-healing mechanism solves these problems by integrating
fsck tightly into the file system core. Errors are expected as normal
as file operations. They are noticed and caught then and there.
Filesystem has full context of the problem to fix. Healing code is
also modular. Each translator implements how to handle broken data
with respect to its own context.

They why do we need 'gfsck' project now?
* Self-healing is inefficient when it comes to full verify (ls -lR).
* Self-healing focuses on active data only. It assumes that the rest
of the data is immutable and durable. In reality, it is not the case.
There are circumstances where the backend brick content can change
without notice. (for example, if your disk filesystem ABI changes
after a kernel upgrade, your data may get corrupted and left
unnoticed. Your fsck.ext4 may do a partial recovery after a power
failure. This corruption can confuse self-heal and propagate to other
nodes. Admin sometimes fiddles with the backend directly..)
* There can be bugs in self-heal code itself.
* gfsck is not a replacement for self-heal, but instead provides a
secondary additional verification. Users can be fairly confident with
the integrity of data if both self-heal and gfsck confirms healthy.

Here are some points:
* catch errors left unnoticed by self-heal
* must perform online fsck
* speed is very important. faster means, more frequent gfsck
* quick and full scan option.
* verify only and very+fix modes
* interactive and noninteractive (--yes) modes
* quiet and verbose
* preferably % completion progress report
* ability to resume partial checks from previous runs
* ability to scan only a subdirectory (and a recursive option)
* cooperate with built-in self-heal and active i/o
* ability for non-root user to perform gfsck on his/her content alone
* daemon mode (ability to run in a loop under low priority).
* concurrent gfsck - from different clients on different folders
* one unified UI for both self-heal and gfsck's own mechanism.
* incorporate some heuristic checks to speed up.

Implementing all of these is beyond the scope your GSoC project. Pick
some of them and get your project accepted into gluster official
branch. You can do the rest in phases. You will have our full support.
-ab

Imagination is more important than knowledge --Albert Einstein



[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux