Re: Wrong behavior on fsync of md-cache ?

Xavier Hernandez <xhernandez@xxxxxxxxxx> · Mon, 24 Nov 2014 20:19:03 +0100

On 24.11.2014 18:53, Raghavendra Gowdappa wrote:

----- Original Message -----
From: "Xavier Hernandez" <xhernandez@xxxxxxxxxx> To: "Gluster Devel" <gluster-devel@xxxxxxxxxxx>, "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx> Cc: "Emmanuel Dreyfus" <manu@xxxxxxxxxx> Sent: Monday, November 24, 2014 11:05:57 PM Subject: Wrong behavior on fsync of md-cache ? Hi, I have an issue in ec caused by what seems an incorrect behavior in md-cache, at least in NetBSD (on linux this doesn't seem to happen). The problem happens when multiple writes are sent in parallel and one of them fails with an error. After the error, an fsync is issued, before all pending writes are completed. The problem is that this fsync request is not propagated through the xlator stack: md-cache automatically answers it with the same error code returned by the last write, but it does not wait for all pending writes to finish.
Are you sure that fsync is short-circuited in md-cache. Looking at mdc_fsync I can see that fsync is wound down the xlator stack unconditionally.

Well, I didn't looked at the code. I assumed that since disabling md-stat it worked (performace.stat-prefetch off), the problem was there. Sorry.

write-behind flushes all pending writes before fsync is wound down the xlator stack.

I think the problem is here: the first thing wb_fsync() checks is if there's an error in the fd (wd_fd_err()). If that's the case, the call is immediately unwinded with that error. The error seems to be set in wb_fulfill_cbk(). I don't know the internals of write-back xlator, but this seems to be the problem.

I'm not sure why disabling md-cache the problem disappeared. Maybe I've made a mistake and I disabled write-back instead. I'll check it again tomorrow.

Are you sure fsync is sent by kernel to glusterfs? May be because of a stale stat information kernel never issues fsync? You can load a debug/trace xlator just above io-stats and check whether you get fsync call (you can also dump fuse to glusterfs traffic using --dump-fuse-path=, but its a bingary file and you need a parser to parse that binary data).

I've seen this lines in log file:

[2014-11-24 16:18:29.348552] T [fuse-bridge.c:2457:fuse_fsync_resume] 0-glusterfs-fuse: 395: FSYNC 0xbb242268
[2014-11-24 16:18:29.348663] W [fuse-bridge.c:1261:fuse_err_cbk] 0-glusterfs-fuse: 395: FSYNC() ERR => -1 (Disc quota exceeded)

There's nothing in between. I assume that this means that the kernel has sent the FSYNC request and someone has returned EDQUOT error immediately (I log a message if FSYNC reaches ec).

Xavi

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-devel