Re: Bitrot stub forget()

Venky Shankar <vshankar@xxxxxxxxxx> · Thu, 18 Feb 2016 09:44:27 +0530

On Tue, Feb 16, 2016 at 11:38:24PM -0500, FNU Raghavendra Manjunath wrote:
> Venky,
> 
> Yes. You are right. We should not remove the quarantine entry in forget.
> 
> We have to remove it upon getting -ve lookups in bit-rot-stub and upon
> getting an unlink.
> 
> I have attached a patch for it.
> 
> Unfortunately rfc.sh is failing for me with the below error.

Thanks for the patch. One of us will take care of putting it up for review.

> 
> 
> ssh: connect to host git.gluster.com port 22: Connection timed out
> fatal: Could not read from remote repository.
> 
> Please make sure you have the correct access rights
> and the repository exists."
> 
> 
> Regards,
> Raghavendra
> 
> 
> On Tue, Feb 16, 2016 at 10:53 AM, Venky Shankar <vshankar@xxxxxxxxxx> wrote:
> 
> > Hey Raghu,
> >
> > Bitrot stub inode forget implementation (br_stub_forget()) deletes the bad
> > object
> > marker (under quarantine directory) if present. This looks incorrect as
> > ->forget()
> > can be trigerred when inode table LRU size exceeeds configured limit -
> > check bug
> > #1308961 which tracks this issue. I recall that protocol/server calls
> > inode_forget()
> > on negative lookup (that might not invoke ->forget() though) and that's
> > the reason
> > why br_stub_forget() has this code.
> >
> > So, would it make sense to purge bad object marker just in lookup()? There
> > might be
> > a need to do the same in unlink() in case the object was removed by the
> > client.
> >
> > Thoughts?
> >
> > Thanks,
> >
> >                 Venky
> >

> From a0cc49172df24e263e0db25c53b57f58c19d2cab Mon Sep 17 00:00:00 2001
> From: Raghavendra Bhat <raghavendra@xxxxxxxxxx>
> Date: Tue, 16 Feb 2016 20:22:36 -0500
> Subject: [PATCH] features/bitrot: do not remove the quarantine handle in
>  forget
> 
> If an object is marked as bad, then an entry is corresponding to the
> bad object is created in the .glusterfs/quarantine directory to help
> scrub status. The entry name is the gfid of the corrupted object.
> The quarantine handle is removed in below 2 cases.
> 
> 1) When protocol/server revceives the -ve lookup on an entry whose inode
>    is there in the inode table (it can happen when the corrupted object
>    is deleted directly from the backend for recovery purpose) it sends a
>    forget on the inode and bit-rot-stub removes the quarantine handle in
>    upon getting the forget.
>    refer to the below commit
>    f853ed9c61bf65cb39f859470a8ffe8973818868:
>    http://review.gluster.org/12743)
> 
> 2) When bit-rot-stub itself realizes that lookup on a corrupted object
>    has failed with ENOENT.
> 
> But with step1, there is a problem when the bit-rot-stub receives forget
> due to lru limit exceeding in the inode table. In such cases, though the
> corrupted object is not deleted (either from the mount point or from the
> backend), the handle in the quarantine directory is removed and that object
> is not shown in the bad objects list in the scrub status command.
> 
> So it is better to follow only 2nd step (i.e. bit-rot-stub removing the handle
> from the quarantine directory in -ve lookups). Also the handle has to be removed
> when a corrupted object is unlinked from the mount point itself.
> 
> Change-Id: Ibc3bbaf4bc8a5f8986085e87b729ab912cbf8cf9
> Signed-off-by: Raghavendra Bhat <raghavendra@xxxxxxxxxx>
> ---
>  xlators/features/bit-rot/src/stub/bit-rot-stub.c | 103 +++++++++++++++++++++--
>  1 file changed, 95 insertions(+), 8 deletions(-)
> 
> diff --git a/xlators/features/bit-rot/src/stub/bit-rot-stub.c b/xlators/features/bit-rot/src/stub/bit-rot-stub.c
> index 9c0f622..920aa7c 100644
> --- a/xlators/features/bit-rot/src/stub/bit-rot-stub.c
> +++ b/xlators/features/bit-rot/src/stub/bit-rot-stub.c
> @@ -2841,19 +2841,46 @@ unwind:
>  
>  /** {{{ */
>  
> -/* forget() */
> +/* unlink() */
>  
>  int
> -br_stub_forget (xlator_t *this, inode_t *inode)
> +br_stub_unlink_cbk (call_frame_t *frame, void *cookie, xlator_t *this,
> +                   int32_t op_ret, int32_t op_errno, struct iatt *preparent,
> +                   struct iatt *postparent, dict_t *xdata)
>  {
> -        uint64_t ctx_addr = 0;
> -        br_stub_inode_ctx_t *ctx = NULL;
> +        br_stub_local_t     *local    = NULL;
> +        inode_t             *inode    = NULL;
> +        uint64_t             ctx_addr = 0;
> +        br_stub_inode_ctx_t *ctx      = NULL;
> +        int32_t              ret      = -1;
>  
> -        inode_ctx_del (inode, this, &ctx_addr);
> -        if (!ctx_addr)
> -                return 0;
> +        if (op_ret < 0)
> +                goto unwind;
>  
> -        ctx = (br_stub_inode_ctx_t *) (long) ctx_addr;
> +        local = frame->local;
> +        frame->local = NULL;
> +
> +        inode = local->u.context.inode;
> +
> +        ret = br_stub_get_inode_ctx (this, inode, &ctx_addr);
> +        if (ret) {
> +                /**
> +                 * If the inode is bad AND context is not there, then there
> +                 * is a possibility of the gfid of the object being listed
> +                 * in the quarantine directory and will be shown in the
> +                 * bad objects list. So continuing with the fop with a
> +                 * warning log. The entry from the quarantine directory
> +                 * has to be removed manually. Its not a good idea to fail
> +                 * the fop, as the object has already been deleted.
> +                 */
> +                gf_msg (this->name, GF_LOG_WARNING, 0,
> +                        BRS_MSG_GET_INODE_CONTEXT_FAILED,
> +                        "failed to get the context for the inode %s",
> +                        uuid_utoa (inode->gfid));
> +                goto unwind;
> +        }
> +
> +        ctx = (br_stub_inode_ctx_t *)(long)ctx_addr;
>  
>          LOCK (&inode->lock);
>          {
> @@ -2868,6 +2895,65 @@ br_stub_forget (xlator_t *this, inode_t *inode)
>          }
>          UNLOCK (&inode->lock);
>  
> +unwind:
> +        STACK_UNWIND_STRICT (unlink, frame, op_ret, op_errno, preparent, postparent, xdata);
> +        br_stub_cleanup_local (local);
> +        br_stub_dealloc_local (local);
> +        return 0;
> +}
> +
> +int
> +br_stub_unlink (call_frame_t *frame, xlator_t *this, loc_t *loc, int flag,
> +                dict_t *xdata)
> +{
> +        br_stub_local_t *local = NULL;
> +        int32_t          op_ret = -1;
> +        int32_t          op_errno = 0;
> +
> +        local = br_stub_alloc_local (this);
> +        if (!local) {
> +                op_ret = -1;
> +                op_errno = ENOMEM;
> +                gf_msg (this->name, GF_LOG_ERROR, ENOMEM, BRS_MSG_NO_MEMORY,
> +                        "failed to allocate memory for local (path: %s, gfid: %s)",
> +                        loc->path, uuid_utoa (loc->inode->gfid));
> +                goto unwind;
> +        }
> +
> +        br_stub_fill_local (local, NULL, NULL, loc->inode,
> +                            loc->inode->gfid,
> +                            BR_STUB_NO_VERSIONING, 0);
> +
> +        frame->local = local;
> +
> +        STACK_WIND (frame, br_stub_unlink_cbk, FIRST_CHILD (this),
> +                    FIRST_CHILD (this)->fops->unlink, loc, flag, xdata);
> +        return 0;
> +
> +unwind:
> +        STACK_UNWIND_STRICT (unlink, frame, op_ret, op_errno, NULL, NULL, NULL);
> +        return 0;
> +}
> +
> +
> +/** }}} */
> +
> +/** {{{ */
> +
> +/* forget() */
> +
> +int
> +br_stub_forget (xlator_t *this, inode_t *inode)
> +{
> +        uint64_t ctx_addr = 0;
> +        br_stub_inode_ctx_t *ctx = NULL;
> +
> +        inode_ctx_del (inode, this, &ctx_addr);
> +        if (!ctx_addr)
> +                return 0;
> +
> +        ctx = (br_stub_inode_ctx_t *) (long) ctx_addr;
> +
>          GF_FREE (ctx);
>  
>          return 0;
> @@ -3133,6 +3219,7 @@ struct xlator_fops fops = {
>          .setxattr  = br_stub_setxattr,
>          .opendir   = br_stub_opendir,
>          .readdir   = br_stub_readdir,
> +        .unlink    = br_stub_unlink,
>  };
>  
>  struct xlator_cbks cbks = {
> -- 
> 2.5.0
> 

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel