Re: DHT NULL pointer usage

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/17/2014 10:50 AM, Xavier Hernandez wrote:
Hi Shyam,

On 11/17/2014 03:50 PM, Shyam wrote:
On 11/17/2014 07:20 AM, Emmanuel Dreyfus wrote:
Hello

I have an almost reliable test that fails on NetBSD:

./tests/basic/ec/quota.t                 (Wstat: 0 Tests: 22 Failed: 3)
   Failed tests:  19-21

This one is a real bug: glusterfsd crashed because of a NULL pointer. I
am going to submit a change to avoid touhcing postbuf is op_ret = -1 but
if someone has a better idea, please let me know.

This should be fine from a DHT perspective. The current tests check if
the file is under migration, using the postbuf attrs sticky and SGID
(for the file). If the FOP errore'd out then the code should look to
check if this is a dht_inode_missing error and then do a phase2 check to
see if the file migrated to a new target (based on a good/bad op_ret).

Looking at the test, it seems that we hit a quota error on the write. In
which case following the file even if it migrated does not make sense.
We should have received an EDQUOT error as the op_errno (if I am not
mistaken) and exited this function at the check below,

dht_writev_cbk:
35:       if (op_ret == -1 && !dht_inode_missing(op_errno)) {
36:                goto out;
37:        }

It's very strange. The failing test tries to write a file. There should
be enough quota for the file to be written and it's not being migrated,
but it seems that the brick is returning ENOENT (at least this is what
ec_writev_cbk() is receiving). However all other tests writing to files
seems to work ok, so I'm not sure what is special with this particular
test.

Hmmm... need to check this out, will do as I get some cycles today or tomorrow, or, if we could understand where/why this errno is cropping from it would be good, as it may point to a different problem.

In this case ENOENT is very strange, as the fd is open, so how did the entry disappear? or, is this some form of open behind in xlators below DHT (wild guess)?

The handling of this error, should trigger P2 migration check in DHT, so the current fix @ http://review.gluster.org/#/c/9139/ is kind of invalid.

NOTE: P1/2 PHASE1 and PHASE2 related checks in DHT

I think we need this fixed in all P2 (and P1) checks in DHT to treat it appropriately. If no postbuf _but_ error is ENOENT or ESTALE, then we need to enter P2 checks in DHT. P1 depends on no error and a valid postbuf to detect migration. This is irrespective of the problem with the errno as discussed above and should be fixed as mentioned in the original mail.

Xavi/Manu, let me know if I/we need to handle this in DHT, or if you would be comfortable posting the patch.



Are we not sending an EDQUOT upwards in this case?

Validating assumptions:
- As this is an fd based operation, the fd should have been open, and so
why did the writev fail exactly? EDQUOT?

The op_errno is 2 (ENOENT)

- Was this an NFS based test, which could explain the use of anon-fds
which would return an error if the file is no longer on the backend. Is
this a test run from an NFS mount point?

It's a normal FUSE mount.

Thank you.

Shyam
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-devel




[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux