Hi, I'm encountering a strange situation. I'm writing to an ext3 filesystem mounted like this: /dev/scsi/host2/bus0/target0/lun0/part3 on /tmp/volumes/md0 type ext3 (rw,noatime,data=ordered) with the physical device being a FC-connected RAID (IBM DF4000) no software raid personality is used I'm using 2.4.18-19.7 My test program writes files in a loop, with the essential system calls being: [pid 13335] mkdir("/tmp/volumes/md0/1/13335/00028", 0777) = 0 [pid 13335] fsync(4) = 0 [pid 13335] time(NULL) = 1044515538 [pid 13335] gettimeofday({1044515538, 96401}, {4294967176, 0}) = 0 [pid 13335] getpid() = 13335 [pid 13335] sendto(3, "[Feb 6 9:12:18.096401 13335]: mkdir /tmp/volumes/md0/1/13335/00028\n", 69, 0, {sin_family=AF_INET, sin_port=htons(12345), sin_addr=inet_addr("127.0.0.1")}}, 16) = 69 [pid 13335] open("/tmp/volumes/md0/1/13335/00028", O_RDONLY|O_SYNC) = 5 [pid 13335] open("/tmp/volumes/md0/1/13335/00028/file", O_RDWR|O_CREAT|O_SYNC, 027777774120) = 6 [pid 13335] fsync(5) = 0 [pid 13335] time(NULL) = 1044515538 [pid 13335] gettimeofday({1044515538, 112558}, {4294967176, 0}) = 0 [pid 13335] getpid() = 13335 [pid 13335] sendto(3, "[Feb 6 9:12:18.112558 13335]: open /tmp/volumes/md0/1/13335/00028/file\n", 73, 0, {sin_family=AF_INET, sin_port=htons(12345), sin_addr=inet_addr("127.0.0.1")}}, 16) = 73 [pid 13335] getpid() = 13335 [pid 13335] write(6, "\31\0\1\2\3\4\5\6\7\10\t\n\v\f\r\16\17\20\21\22\23\24\25\26\27\30\31\0\1\2\3\4\5\6\7\10\t\n\v\f\r\16\17\20\21\22\23\24\25\26\27\30\31\0\1\2\3\4\5\6\7\10\t\n\v\f\r\16\17\20\21\22\23\24\25\26\27\30\31\0\1\2\3\4\5\6\7\10\t\n\v\f\r\16\17\20\21\22\23\24\25\26\27\30\31\0\1\2\3\4\5\6\7\10\t\n\v\f\r\16\17\20\21\22\23\24\25\26\27\30\31\0\1\2\3\4\5\6\7\10\t\n\v\f\r\16\17\20\21\22\23\24\25\26\27\30\31\0\1\2\3\4\5\6\7\10\t\n\v\f\r\16\17\20\21\22\23\24\25\26\27\30\31\0\1\2\3\4\5\6\7\10\t\n\v\f\r\16\17\20\21\22\23\24\25\26\27\30\31\0\1\2\3\4\5\6\7\10\t\n\v\f\r\16\17\20\21\22\23\24\25\26\27\30\31\0\1\2\3\4\5\6\7\10\t\n\v\f\r\16\17\20\21\22\23\24\25\26\27\30\31\0\1\2\3\4\5\6\7\10\t\n\v\f\r\16\17\20\21\22\23\24\25\26\27\30\31\0\1\2\3\4\5\6\7\10\t\n\v\f\r\16\17\20\21\22\23\24\25\26\27\30\31\0\1\2\3\4\5\6\7\10\t\n\v\f\r\16\17\20\21\22\23\24\25\26\27\30\31\0\1\2\3\4\5\6\7\10\t\n\v\f\r\16\17\20\21\22\23\24\25\26\27\30\31\0\1\2\3\4\5\6\7\10\t\n\v\f\r\16\17\20\21\22\23\24\25\26\27\30\31\0\1\2\3\4\5"..., 8192) = 8192 [pid 13335] fsync(6) = 0 [pid 13335] time(NULL) = 1044515538 [pid 13335] gettimeofday({1044515538, 160822}, {4294967176, 0}) = 0 [pid 13335] getpid() = 13335 [pid 13335] sendto(3, "[Feb 6 9:12:18.160822 13335]: write /tmp/volumes/md0/1/13335/00028/file\n", 74, 0, {sin_family=AF_INET, sin_port=htons(12345), sin_addr=inet_addr("127.0.0.1")}}, 16) = 74 [pid 13335] close(6) = 0 [pid 13335] close(5) = 0 [pid 13335] time(NULL) = 1044515538 [pid 13335] gettimeofday({1044515538, 161148}, {4294967176, 0}) = 0 [pid 13335] getpid() = 13335 [pid 13335] sendto(3, "[Feb 6 9:12:18.161148 13335]: finished /tmp/volumes/md0/1/13335/00028/file\n", 77, 0, {sin_family=AF_INET, sin_port=htons(12345), sin_addr=inet_addr("127.0.0.1")}}, 16) = 77 [ In the middle of this loop of writing I turn down the power to my RAID in the first minute the last writes/actions seem to be pending. After that the write actions return successfully (which is very strange) and all following actions return with errors. When I turn on the power again, the fs seems to restore, and the file is available with the correct data. But when I do e2fsck to the device I get: [root@node2-integ /]# e2fsck -y -f /dev/scsi/host2/bus0/target0/lun0/part3 e2fsck 1.27 (8-Mar-2002) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Entry '8075' in /1 (1687558) has an incorrect filetype (was 2, should be 1). Fix? yes Entry '00036' in /1/8022 (1736710) has deleted/unused inode 1179660. Clear? yes Entry '00037' in /1/8024 (1802246) has deleted/unused inode 1146892. Clear? yes Entry '00036' in /1/8027 (1998854) has deleted/unused inode 2654219. Clear? yes Entry '00036' in /1/8035 (2097158) has deleted/unused inode 2588683. Clear? yes Entry '00036' in /1/8033 (2228230) has deleted/unused inode 1212428. Clear? yes Entry '00037' in /1/8038 (2326534) has deleted/unused inode 1114124. Clear? yes Entry '00009' in /2/8179 (2981900) is a link to directory /1/8069 (1785868). Clear? yes Entry '00013' in /2/8176 (2998284) is a link to directory /1/8045/00036 (1261580). Clear? yes Entry '00013' in /2/8178 (3047436) is a link to directory /1/8039/00036 (1196044). Clear? yes Entry '00009' in /2/8184 (3227660) is a link to directory /1/8071 (1949708). Clear? yes Entry '00012' in /2/8182 (3293196) is a link to directory /1/8029/00037 (1277964). Clear? yes Entry '00012' in /2/8193 (3375116) is a link to directory /1/8021/00037 (1228812). Clear? yes Entry '00008' in /2/8188 (3522572) is a link to directory /1/8065 (1523724). Clear? yes Entry '00009' in /2/8186 (3588108) is a link to directory /1/8070 (1884172). Clear? yes Entry '00012' in /2/8198 (3637260) is a link to directory /1/8025/00038 (1294348). Clear? yes Entry '00008' in /2/8195 (3735564) is a link to directory /1/8066 (1556492). Clear? yes Entry '00009' in /2/8204 (3964940) is a link to directory /1/8068 (1753100). Clear? yes Entry '00008' in /2/8205 (3981319) is a link to directory /1/8067 (1589260). Clear? yes Entry '00012' in /2/8205 (3981319) is a link to directory /1/8034/00036 (1245196). Clear? yes Pass 3: Checking directory connectivity '..' in /1/8039/00036 (1196044) is /2/8178 (3047436), should be /1/8039 (2359302). Fix? yes '..' in /1/8021/00037 (1228812) is /2/8193 (3375116), should be /1/8021 (1703942). Fix? yes '..' in /1/8034/00036 (1245196) is /2/8205 (3981319), should be /1/8034 (2179078). Fix? yes '..' in /1/8045/00036 (1261580) is /2/8176 (2998284), should be /1/8045 (2703366). Fix? yes '..' in /1/8029/00037 (1277964) is /2/8182 (3293196), should be /1/8029 (1900550). Fix? yes '..' in /1/8025/00038 (1294348) is /2/8198 (3637260), should be /1/8025 (1867782). Fix? yes '..' in /1/8065 (1523724) is /2/8188 (3522572), should be /1 (1687558). Fix? yes '..' in /1/8066 (1556492) is /2/8195 (3735564), should be /1 (1687558). Fix? yes '..' in /1/8067 (1589260) is /2/8205 (3981319), should be /1 (1687558). Fix? yes '..' in /1/8068 (1753100) is /2/8204 (3964940), should be /1 (1687558). Fix? yes '..' in /1/8069 (1785868) is /2/8179 (2981900), should be /1 (1687558). Fix? yes Unconnected directory inode 1835020 (/1/8069/???) Connect to /lost+found? yes '..' in /1/8070 (1884172) is /2/8186 (3588108), should be /1 (1687558). Fix? yes '..' in /1/8071 (1949708) is /2/8184 (3227660), should be /1 (1687558). Fix? yes Unconnected directory inode 2064396 (/1/8071/???) Connect to /lost+found? yes Pass 4: Checking reference counts Inode 1671181 ref count is 1, should be 2. Fix? yes Inode 1687558 ref count is 73, should be 65. Fix? yes Inode 1703942 ref count is 41, should be 40. Fix? yes Inode 1736710 ref count is 39, should be 38. Fix? yes Inode 1785868 ref count is 1, should be 2. Fix? yes Inode 1802246 ref count is 40, should be 39. Fix? yes Inode 1835020 ref count is 3, should be 2. Fix? yes Inode 1867782 ref count is 42, should be 41. Fix? yes Inode 1900550 ref count is 41, should be 40. Fix? yes Inode 1949708 ref count is 1, should be 2. Fix? yes Inode 1998854 ref count is 39, should be 38. Fix? yes Inode 2064396 ref count is 3, should be 2. Fix? yes Inode 2097158 ref count is 39, should be 38. Fix? yes Inode 2179078 ref count is 40, should be 39. Fix? yes Inode 2228230 ref count is 39, should be 38. Fix? yes Inode 2326534 ref count is 40, should be 39. Fix? yes Inode 2359302 ref count is 40, should be 39. Fix? yes Inode 2703366 ref count is 40, should be 39. Fix? yes Pass 5: Checking group summary information Block bitmap differences: -2228755 -2294289 -2359827 -2392595 -2425363 -2458131 -2490899 -2523667 -2556435 -2589203 -(3441172--3441173) -(5177874--5177876) -(5308946--5308948) -11862554 -12157466 -12223002 -12255770 -12288538 Fix? yes Free blocks count wrong for group #68 (32232, counted=32235). Fix? yes Free blocks count wrong for group #70 (32232, counted=32235). Fix? yes Free blocks count wrong for group #72 (32232, counted=32235). Fix? yes Free blocks count wrong for group #73 (32232, counted=32235). Fix? yes <snipped groups > Free blocks count wrong for group #74 (32232, counted=32235). Fix? yes Free blocks count wrong for group #75 (32232, counted=32235). Fix? yes Free blocks count wrong for group #76 (32229, counted=32232). Fix? yes Free blocks count wrong for group #77 (32232, counted=32235). Fix? yes Free blocks count wrong for group #78 (32232, counted=32235). Fix? yes Free blocks count wrong for group #79 (32232, counted=32235). Fix? yes Free blocks count wrong for group #81 (32233, counted=32235). Fix? yes Free blocks count wrong for group #82 (32233, counted=32235). Fix? yes Free blocks count wrong for group #83 (32233, counted=32235). Fix? yes Free blocks count wrong for group #84 (32233, counted=32235). Fix? yes Free blocks count wrong for group #85 (32233, counted=32235). Fix? yes Free blocks count wrong for group #86 (32233, counted=32235). Fix? yes Free blocks count wrong for group #87 (32233, counted=32235). Fix? yes Free blocks count wrong for group #88 (32233, counted=32235). Fix? yes Free blocks count wrong for group #89 (32233, counted=32234). Fix? yes Free blocks count wrong for group #90 (32233, counted=32234). Fix? yes Free blocks count wrong for group #91 (32233, counted=32234). Fix? yes Free blocks count wrong for group #92 (32233, counted=32234). Fix? yes Free blocks count wrong for group #93 (32233, counted=32234). Fix? yes Free blocks count wrong for group #94 (32233, counted=32234). Fix? yes Free blocks count wrong for group #95 (32233, counted=32234). Fix? yes Free blocks count wrong for group #96 (32233, counted=32234). Fix? yes Free blocks count wrong for group #97 (32233, counted=32234). Fix? yes Free blocks count wrong for group #101 (32233, counted=32235). Fix? yes Free blocks count wrong for group #102 (32233, counted=32235). Fix? yes Free blocks count wrong for group #105 (32231, counted=32234). Fix? yes Free blocks count wrong for group #107 (32233, counted=32234). Fix? yes Free blocks count wrong for group #109 (32233, counted=32234). Fix? yes Free blocks count wrong for group #113 (32233, counted=32234). Fix? yes Free blocks count wrong for group #115 (32233, counted=32234). Fix? yes Free blocks count wrong for group #118 (32233, counted=32234). Fix? yes Free blocks count wrong for group #119 (32233, counted=32234). Fix? yes Free blocks count wrong for group #158 (32233, counted=32236). Fix? yes Free blocks count wrong for group #162 (32233, counted=32236). Fix? yes Free blocks count wrong for group #256 (32234, counted=32231). Fix? yes Free blocks count wrong for group #257 (32234, counted=32231). Fix? yes Free blocks count wrong for group #258 (32234, counted=32231). Fix? yes Free blocks count wrong for group #259 (32234, counted=32231). Fix? yes Free blocks count wrong for group #260 (32234, counted=32231). Fix? yes Free blocks count wrong for group #261 (32234, counted=32231). Fix? yes Free blocks count wrong for group #262 (32234, counted=32231). Fix? yes Free blocks count wrong for group #263 (32234, counted=32231). Fix? yes Free blocks count wrong for group #264 (32234, counted=32231). Fix? yes Free blocks count wrong for group #265 (32234, counted=32231). Fix? yes Free blocks count wrong for group #266 (32234, counted=32231). Fix? yes Free blocks count wrong for group #267 (32234, counted=32231). Fix? yes Free blocks count wrong for group #268 (32234, counted=32231). Fix? yes Free blocks count wrong for group #269 (32234, counted=32231). Fix? yes Free blocks count wrong for group #270 (32234, counted=32231). Fix? yes Free blocks count wrong for group #271 (32234, counted=32231). Fix? yes Free blocks count wrong for group #273 (32234, counted=32231). Fix? yes Free blocks count wrong for group #274 (32234, counted=32231). Fix? yes Free blocks count wrong for group #275 (32234, counted=32231). Fix? yes Free blocks count wrong for group #276 (32234, counted=32231). Fix? yes Free blocks count wrong for group #277 (32234, counted=32231). Fix? yes Free blocks count wrong for group #278 (32234, counted=32231). Fix? yes Free blocks count wrong for group #279 (32234, counted=32231). Fix? yes Free blocks count wrong for group #280 (32234, counted=32231). Fix? yes Free blocks count wrong for group #281 (32234, counted=32231). Fix? yes Free blocks count wrong for group #282 (32234, counted=32231). Fix? yes Free blocks count wrong for group #283 (32234, counted=32231). Fix? yes Free blocks count wrong for group #284 (32234, counted=32231). Fix? yes Free blocks count wrong for group #285 (32234, counted=32231). Fix? yes Free blocks count wrong for group #286 (32234, counted=32231). Fix? yes Free blocks count wrong for group #287 (32234, counted=32231). Fix? yes Free blocks count wrong for group #288 (32234, counted=32231). Fix? yes Free blocks count wrong for group #289 (32234, counted=32231). Fix? yes Free blocks count wrong for group #290 (32234, counted=32231). Fix? yes Free blocks count wrong for group #291 (32234, counted=32231). Fix? yes Free blocks count wrong for group #292 (32234, counted=32231). Fix? yes Free blocks count wrong for group #293 (32234, counted=32231). Fix? yes Free blocks count wrong for group #294 (32234, counted=32231). Fix? yes Free blocks count wrong for group #295 (32234, counted=32231). Fix? yes Free blocks count wrong for group #296 (32234, counted=32231). Fix? yes Free blocks count wrong for group #297 (32234, counted=32231). Fix? yes Free blocks count wrong for group #298 (32234, counted=32231). Fix? yes Free blocks count wrong for group #299 (32234, counted=32231). Fix? yes Free blocks count wrong for group #300 (32234, counted=32231). Fix? yes Free blocks count wrong for group #301 (32234, counted=32231). Fix? yes Free blocks count wrong for group #302 (32234, counted=32231). Fix? yes Free blocks count wrong for group #303 (32234, counted=32231). Fix? yes Free blocks count wrong for group #304 (32234, counted=32231). Fix? yes Free blocks count wrong for group #305 (32234, counted=32231). Fix? yes Free blocks count wrong for group #306 (32234, counted=32231). Fix? yes Free blocks count wrong for group #307 (32234, counted=32231). Fix? yes Free blocks count wrong for group #308 (32234, counted=32231). Fix? yes Free blocks count wrong for group #309 (32234, counted=32231). Fix? yes Free blocks count wrong for group #310 (32234, counted=32231). Fix? yes Free blocks count wrong for group #311 (32234, counted=32231). Fix? yes Free blocks count wrong for group #312 (32234, counted=32231). Fix? yes Free blocks count wrong for group #313 (32234, counted=32231). Fix? yes Free blocks count wrong for group #314 (32234, counted=32231). Fix? yes Free blocks count wrong for group #315 (32234, counted=32231). Fix? yes Free blocks count wrong for group #316 (32234, counted=32231). Fix? yes Free blocks count wrong for group #317 (32234, counted=32231). Fix? yes Free blocks count wrong for group #318 (32234, counted=32231). Fix? yes Free blocks count wrong for group #319 (32234, counted=32231). Fix? yes Free blocks count wrong for group #320 (32234, counted=32231). Fix? yes Free blocks count wrong for group #321 (32234, counted=32231). Fix? yes Free blocks count wrong for group #322 (32234, counted=32231). Fix? yes Free blocks count wrong for group #323 (32234, counted=32231). Fix? yes Free blocks count wrong for group #324 (32234, counted=32231). Fix? yes Free blocks count wrong for group #325 (32234, counted=32231). Fix? yes Free blocks count wrong for group #326 (32234, counted=32231). Fix? yes Free blocks count wrong for group #327 (32234, counted=32231). Fix? yes Free blocks count wrong for group #328 (32234, counted=32231). Fix? yes Free blocks count wrong for group #329 (32234, counted=32231). Fix? yes Free blocks count wrong for group #330 (32234, counted=32231). Fix? yes Free blocks count wrong for group #331 (32234, counted=32231). Fix? yes Free blocks count wrong for group #332 (32234, counted=32231). Fix? yes Free blocks count wrong for group #333 (32234, counted=32231). Fix? yes Free blocks count wrong for group #334 (32234, counted=32231). Fix? yes Free blocks count wrong for group #335 (32234, counted=32231). Fix? yes Free blocks count wrong for group #336 (32234, counted=32231). Fix? yes Free blocks count wrong for group #337 (32234, counted=32231). Fix? yes Free blocks count wrong for group #338 (32234, counted=32231). Fix? yes Free blocks count wrong for group #339 (32234, counted=32231). Fix? yes Free blocks count wrong for group #340 (32234, counted=32231). Fix? yes Free blocks count wrong for group #341 (32234, counted=32231). Fix? yes Free blocks count wrong for group #342 (32234, counted=32231). Fix? yes <snipped again> and then the file is missing. This seems to be incorrect behaviour. How should my application know that there was an error writing the file and provide the correct recovery ? I've yet to test this on other filesystems or journal modes. Does anyone have any explanation for this behaviour, or suggestion for workarounds ? I can provide the test program if anyone is interested in reproducing. Regards, Yuval -- Yuval Yeret Exanet yuval@exanet.com http://www.exanet.com Tel. 972-9-9717782 Fax. 972-9-9717778 <<Yuval Yeret (E-mail).vcf>>
BEGIN:VCARD VERSION:2.1 N:Yeret;Yuval FN:Yuval Yeret (E-mail) ORG:Exanet TITLE:Clustering and Platform Team Leader TEL;WORK;VOICE:972-9-9717782 TEL;CELL;VOICE:972-54-802458 TEL;WORK;FAX:972-9-9717778 ADR;WORK;ENCODING=QUOTED-PRINTABLE:;;9 hamanofim st=0D=0Ap.o. 4064;Herzelia;;46140;Israel LABEL;WORK;ENCODING=QUOTED-PRINTABLE:9 hamanofim st=0D=0Ap.o. 4064=0D=0AHerzelia 46140=0D=0AIsrael EMAIL;PREF;INTERNET:mailto:yuval@exanet.com REV:20020903T070728Z END:VCARD