Den 05.02.2012 22:20, skrev David Coulson: > > > On 2/5/12 4:12 PM, Ove K. Pettersen wrote: >> Hi... >> >> Started playing with gluster. And the heal functions is my "target" >> for testing. >> >> Short description of my test >> ---------------------------- >> * 4 replicas on single machine >> * glusterfs mounted locally >> * Create file on glusterfs-mounted directory: date >data.txt > Usually it is bad to write directly to the brick. I know it is bad to write to a brick directly, but my goal is to try to simulate a failure on a brick. And those can occure without a complete loss. Maybe a file is destroyed/modified, but still exist. > >> * Append to file on one of the bricks: hostname >>data.txt >> * Trigger a self-heal with: stat data.txt >> => Modified file on the brick is still different from the three other >> bricks. > What a surprise :-) >> >> Q1: Is the modified file supposed to get corrected in my case? >> >> Q2: Is my test-case invalid for what gluster is supposed to handle? > Probably, since the file has no gfid attributes, gluster has no clue > what to do with it. If you delete it, it should self-heal. As I stated in my details, I did an append to the file. I checked before and after I did the append to the file. It had the same extended attributes also after me appending. Attribute "gfid" has a 16 byte value for data.txt Attribute "afr.d1-client-1" has a 12 byte value for data.txt Attribute "afr.d1-client-0" has a 12 byte value for data.txt Attribute "afr.d1-client-3" has a 12 byte value for data.txt Attribute "afr.d1-client-2" has a 12 byte value for data.txt So I believe gluster should still consider the file as "a gluster-owned file". >> >> I would be very happy to receive an opinion on this. >> >> All the details are put at the bottom >> >> >> Regards, >> >> Ove >> >> >> >> >> >> >> Details and logs >> ---------------- >> My environment >> * single machine with OpenSuse 12.1 >> * gluster 3.3beta2 >> * xfs is used for the brick-fs (each one to a separate disk) >> >> Volume Name: d1 >> Type: Replicate >> Status: Started >> Number of Bricks: 4 >> Transport-type: tcp >> Bricks: >> Brick1: glusterdev:/b1 >> Brick2: glusterdev:/b2 >> Brick3: glusterdev:/b3 >> Brick4: glusterdev:/b4 >> Options Reconfigured: >> diagnostics.brick-log-level: DEBUG >> diagnostics.client-log-level: DEBUG >> >> * My glustervolume is mountet as glusterfs on /d1 >> glusterdev:/d1 52403200 33024 52370176 1% /d1 >> >> >> >> 1) I put a file on the glusterfs >> date >/d1/data.txt >> >> 2) Checking the storage on the bricks: ls -l /b1 /b2 /b3 /b4 >> /b1: >> total 8 >> -rw-r--r-- 1 root root 29 Feb 5 21:40 data.txt >> >> /b2: >> total 8 >> -rw-r--r-- 1 root root 29 Feb 5 21:40 data.txt >> >> /b3: >> total 8 >> -rw-r--r-- 1 root root 29 Feb 5 21:40 data.txt >> >> /b4: >> total 8 >> -rw-r--r-- 1 root root 29 Feb 5 21:40 data.txt >> >> >> Everything is OK. >> >> 3) Then append a line on to /b1/data.txt >> hostname >>/b1/data.txt >> >> By just appending, I keep the inode and all the extended attributes >> (checked with attr -l on the /b1/data.txt file). >> >> >> 4) Check again: ls -l /b1 /b2 /b3 /b4 >> /b1: >> total 8 >> -rw-r--r-- 1 root root 40 Feb 5 21:40 data.txt >> >> /b2: >> total 8 >> -rw-r--r-- 1 root root 29 Feb 5 21:40 data.txt >> >> /b3: >> total 8 >> -rw-r--r-- 1 root root 29 Feb 5 21:40 data.txt >> >> /b4: >> total 8 >> -rw-r--r-- 1 root root 29 Feb 5 21:40 data.txt >> >> Now we have a mismatch between the bricks. Both in contents and also >> the size of the file. >> >> 4) Trigger a self-heal by using stat >> cd /d1 ; stat data.txt >> >> File: `data.txt' >> Size: 29 Blocks: 16 IO Block: 131072 regular file >> Device: 1fh/31d Inode: 18446744070822815228 Links: 1 >> Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) >> Access: 2012-02-05 21:40:32.382157697 +0100 >> Modify: 2012-02-05 21:40:32.388157658 +0100 >> Change: 2012-02-05 21:40:32.389157657 +0100 >> Birth: - >> >> >> 5) stat gives the correct size of 29 which was the size of the file >> stored on the glusterfs. >> 6) DEBUG-log from the client when running this stat-command below. >> >> 7) Check again the bricks: ls -l /b1 /b2 /b3 /b4 >> /b1: >> total 8 >> -rw-r--r-- 1 root root 40 Feb 5 21:40 data.txt >> >> /b2: >> total 8 >> -rw-r--r-- 1 root root 29 Feb 5 21:40 data.txt >> >> /b3: >> total 8 >> -rw-r--r-- 1 root root 29 Feb 5 21:40 data.txt >> >> /b4: >> total 8 >> -rw-r--r-- 1 root root 29 Feb 5 21:40 data.txt >> >> >> 8) Brick /b1 still has a bad copy of the file. >> >> >> >> In the logs it looks like the difference in size is detected. >> It also starts some self-healing actions. >> But then for some reason finds out no healing is needed. >> Marked a few lines with <<<<< ???? which looks suspect. >> >> ===============CUT Client Log >> [2012-02-05 21:41:01.206102] D >> [client-handshake.c:179:client_start_ping] 0-d1-client-0: returning >> as transport is already disconnected OR there are no frames (0 || 0) >> [2012-02-05 21:41:01.206201] D >> [client-handshake.c:179:client_start_ping] 0-d1-client-1: returning >> as transport is already disconnected OR there are no frames (0 || 0) >> [2012-02-05 21:41:01.206224] D >> [client-handshake.c:179:client_start_ping] 0-d1-client-2: returning >> as transport is already disconnected OR there are no frames (0 || 0) >> [2012-02-05 21:41:01.206241] D >> [client-handshake.c:179:client_start_ping] 0-d1-client-3: returning >> as transport is already disconnected OR there are no frames (0 || 0) >> [2012-02-05 21:41:03.279124] D >> [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] >> 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] >> [2012-02-05 21:41:03.279182] D >> [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] >> 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] >> [2012-02-05 21:41:03.279202] D >> [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] >> 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] >> [2012-02-05 21:41:03.279218] D >> [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] >> 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] >> [2012-02-05 21:41:03.279232] D >> [afr-self-heal-common.c:604:afr_find_child_character_type] >> 0-d1-replicate-0: child 0 character innocent >> [2012-02-05 21:41:03.279251] D >> [afr-self-heal-common.c:604:afr_find_child_character_type] >> 0-d1-replicate-0: child 1 character innocent >> [2012-02-05 21:41:03.279266] D >> [afr-self-heal-common.c:604:afr_find_child_character_type] >> 0-d1-replicate-0: child 2 character innocent >> [2012-02-05 21:41:03.279280] D >> [afr-self-heal-common.c:604:afr_find_child_character_type] >> 0-d1-replicate-0: child 3 character innocent >> [2012-02-05 21:41:03.279294] D >> [afr-self-heal-common.c:738:afr_mark_sources] 0-d1-replicate-0: >> Number of sources: 0 >> [2012-02-05 21:41:03.279309] D >> [afr-self-heal-data.c:836:afr_lookup_select_read_child_by_txn_type] >> 0-d1-replicate-0: returning read_child: 1 >> [2012-02-05 21:41:03.279324] D >> [afr-common.c:1183:afr_lookup_select_read_child] 0-d1-replicate-0: >> Source selected as 1 for / >> [2012-02-05 21:41:03.279341] D >> [afr-common.c:990:afr_lookup_build_response_params] 0-d1-replicate-0: >> Building lookup response from 1 >> [2012-02-05 21:41:03.279561] D [afr-common.c:1945:afr_lookup] >> 0-d1-replicate-0: failed to get the gfid from dict >> [2012-02-05 21:41:03.280265] D >> [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] >> 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] >> [2012-02-05 21:41:03.280299] D >> [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] >> 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] >> [2012-02-05 21:41:03.280317] D >> [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] >> 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] >> [2012-02-05 21:41:03.280332] D >> [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] >> 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] >> [2012-02-05 21:41:03.280346] D >> [afr-self-heal-common.c:604:afr_find_child_character_type] >> 0-d1-replicate-0: child 0 character innocent >> [2012-02-05 21:41:03.280377] D >> [afr-self-heal-common.c:604:afr_find_child_character_type] >> 0-d1-replicate-0: child 1 character innocent >> [2012-02-05 21:41:03.280392] D >> [afr-self-heal-common.c:604:afr_find_child_character_type] >> 0-d1-replicate-0: child 2 character innocent >> [2012-02-05 21:41:03.280405] D >> [afr-self-heal-common.c:604:afr_find_child_character_type] >> 0-d1-replicate-0: child 3 character innocent >> [2012-02-05 21:41:03.280419] D >> [afr-self-heal-common.c:738:afr_mark_sources] 0-d1-replicate-0: >> Number of sources: 0 >> [2012-02-05 21:41:03.280433] D >> [afr-self-heal-data.c:836:afr_lookup_select_read_child_by_txn_type] >> 0-d1-replicate-0: returning read_child: 2 >> [2012-02-05 21:41:03.280507] D >> [afr-common.c:1183:afr_lookup_select_read_child] 0-d1-replicate-0: >> Source selected as 2 for /data.txt >> [2012-02-05 21:41:03.280530] D >> [afr-common.c:990:afr_lookup_build_response_params] 0-d1-replicate-0: >> Building lookup response from 2 >> [2012-02-05 21:41:03.280548] I >> [afr-common.c:1076:afr_detect_self_heal_by_iatt] 0-d1-replicate-0: >> size differs for /data.txt >> [2012-02-05 21:41:03.280576] I >> [afr-common.c:1225:afr_launch_self_heal] 0-d1-replicate-0: >> background data self-heal triggered. path: /data.txt >> [2012-02-05 21:41:03.280612] D >> [afr-self-heal-metadata.c:78:afr_sh_metadata_done] 0-d1-replicate-0: >> proceeding to data check on /data.txt >> [2012-02-05 21:41:03.281307] D >> [afr-lk-common.c:1397:afr_nonblocking_inodelk] 0-d1-replicate-0: >> attempting data lock range 0 0 by 140191242417704 >> [2012-02-05 21:41:03.281734] D >> [afr-self-heal-data.c:1136:afr_sh_data_post_nonblocking_inodelk_cbk] >> 0-d1-replicate-0: Non Blocking data inodelks done for /data.txt by >> 140191242417704. Proceeding to self-heal >> [2012-02-05 21:41:03.282815] D >> [afr-self-heal-data.c:650:afr_sh_data_fix] 0-d1-replicate-0: Pending >> matrix for: 140191242417704 >> [2012-02-05 21:41:03.282896] D >> [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] >> 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] >> [2012-02-05 21:41:03.282940] D >> [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] >> 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] >> [2012-02-05 21:41:03.282978] D >> [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] >> 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] >> [2012-02-05 21:41:03.283015] D >> [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] >> 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] >> [2012-02-05 21:41:03.283050] D >> [afr-self-heal-common.c:604:afr_find_child_character_type] >> 0-d1-replicate-0: child 0 character innocent >> [2012-02-05 21:41:03.283095] D >> [afr-self-heal-common.c:604:afr_find_child_character_type] >> 0-d1-replicate-0: child 1 character innocent >> [2012-02-05 21:41:03.283145] D >> [afr-self-heal-common.c:604:afr_find_child_character_type] >> 0-d1-replicate-0: child 2 character innocent >> [2012-02-05 21:41:03.283191] D >> [afr-self-heal-common.c:604:afr_find_child_character_type] >> 0-d1-replicate-0: child 3 character innocent >> [2012-02-05 21:41:03.283229] D >> [afr-self-heal-common.c:738:afr_mark_sources] 0-d1-replicate-0: >> Number of sources: 0 <<<<<< ????? >> [2012-02-05 21:41:03.283264] D >> [afr-self-heal-data.c:657:afr_sh_data_fix] 0-d1-replicate-0: No >> self-heal needed for /data.txt <<<<<< ????? >> [2012-02-05 21:41:03.283306] D >> [afr-self-heal-data.c:320:afr_sh_data_finish] 0-d1-replicate-0: >> finishing data selfheal of /data.txt >> [2012-02-05 21:41:03.283344] D >> [afr-lk-common.c:410:transaction_lk_op] 0-d1-replicate-0: lk op is >> for a self heal >> [2012-02-05 21:41:03.283379] D >> [afr-lk-common.c:597:afr_unlock_inodelk] 0-d1-replicate-0: attempting >> data unlock range 0 0 by 140191242417704 >> [2012-02-05 21:41:03.283879] I >> [afr-self-heal-common.c:2022:afr_self_heal_completion_cbk] >> 0-d1-replicate-0: background data self-heal completed on /data.txt >> [2012-02-05 21:41:03.283960] D >> [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] >> 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] >> [2012-02-05 21:41:03.284004] D >> [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] >> 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] >> [2012-02-05 21:41:03.284042] D >> [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] >> 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] >> [2012-02-05 21:41:03.284098] D >> [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] >> 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] >> [2012-02-05 21:41:03.284136] D >> [afr-self-heal-common.c:604:afr_find_child_character_type] >> 0-d1-replicate-0: child 0 character innocent >> [2012-02-05 21:41:03.284173] D >> [afr-self-heal-common.c:604:afr_find_child_character_type] >> 0-d1-replicate-0: child 1 character innocent >> [2012-02-05 21:41:03.284216] D >> [afr-self-heal-common.c:604:afr_find_child_character_type] >> 0-d1-replicate-0: child 2 character innocent >> [2012-02-05 21:41:03.284253] D >> [afr-self-heal-common.c:604:afr_find_child_character_type] >> 0-d1-replicate-0: child 3 character innocent >> [2012-02-05 21:41:03.284290] D >> [afr-self-heal-common.c:738:afr_mark_sources] 0-d1-replicate-0: >> Number of sources: 0 >> [2012-02-05 21:41:03.284325] D >> [afr-self-heal-data.c:836:afr_lookup_select_read_child_by_txn_type] >> 0-d1-replicate-0: returning read_child: 2 >> [2012-02-05 21:41:03.284361] D >> [afr-common.c:1183:afr_lookup_select_read_child] 0-d1-replicate-0: >> Source selected as 2 for /data.txt >> [2012-02-05 21:41:03.284398] D >> [afr-common.c:990:afr_lookup_build_response_params] 0-d1-replicate-0: >> Building lookup response from 2 >> [2012-02-05 21:41:03.285622] D >> [client-lk.c:442:delete_granted_locks_fd] 0-d1-client-2: Number of >> locks cleared=0 >> [2012-02-05 21:41:03.285690] D >> [client-lk.c:442:delete_granted_locks_fd] 0-d1-client-1: Number of >> locks cleared=0 >> [2012-02-05 21:41:03.285779] D >> [client-lk.c:442:delete_granted_locks_fd] 0-d1-client-3: Number of >> locks cleared=0 >> [2012-02-05 21:41:03.285880] D >> [client-lk.c:442:delete_granted_locks_fd] 0-d1-client-0: Number of >> locks cleared=0 >> >> ===============CUT Client Log >> >> >> >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >