Re: More ext4 acl/xattr corruption - 4th occurence now

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, May 14, 2009 at 10:07:32AM -0400, Theodore Tso wrote:
> On Thu, May 14, 2009 at 10:55:06PM +0930, Kevin Shanahan wrote:
> > On Thu, May 14, 2009 at 08:37:00PM +0930, Kevin Shanahan wrote:
> > > Sure - now running with 2.6.29.3 + your patch.
> > > 
> > >   patching file fs/ext4/inode.c
> > >   Hunk #1 succeeded at 1040 with fuzz 1 (offset -80 lines).
> > >   Hunk #2 succeeded at 1113 (offset -81 lines).
> > >   Hunk #3 succeeded at 1184 (offset -93 lines).
> > > 
> > > I'll report any hits for "check_block_validity" in syslog.
> > 
> > That didn't take long:
> > 
> > May 14 22:49:17 hermes kernel: EXT4-fs error (device dm-0): check_block_validity: inode #759 logical block 1741329 mapped to 529 (size 1)
> > May 14 22:49:17 hermes kernel: Aborting journal on device dm-0:8.
> > May 14 22:49:17 hermes kernel: ext4_da_writepages: jbd2_start: 293 pages, ino 759; err -30
> > May 14 22:49:17 hermes kernel: Pid: 374, comm: pdflush Not tainted 2.6.29.3 #1
> > May 14 22:49:17 hermes kernel: Call Trace:
> 
> The reason why I put in the ext4_error() was because I figured it was
> better to stop the filesystem from stomping on the inode table (since
> that way would lie data loss).  It would also freeze the filesystem so
> we could see what was happening.

Sure, makes sense. I guess I have gotten complacent as I have been
getting away with it for several months now. :)

> Could you run debugfs on the filesystem, and then send me the results
> of these commands:
> 
> debugfs: stat <759>

hermes:~# debugfs /dev/dm-0
debugfs 1.41.3 (12-Oct-2008)
debugfs:  stat <759>

Inode: 759   Type: regular    Mode:  0660   Flags: 0x80000
Generation: 3979120103    Version: 0x00000000:00000001
User:     0   Group: 10140   Size: 14615630848
File ACL: 0    Directory ACL: 0
Links: 1   Blockcount: 28546168
Fragment:  Address: 0    Number: 0    Size: 0
 ctime: 0x4a0acdb5:2a88cbec -- Wed May 13 23:10:05 2009
 atime: 0x4a0ac45b:10899618 -- Wed May 13 22:30:11 2009
 mtime: 0x4a0acdb5:2a88cbec -- Wed May 13 23:10:05 2009
crtime: 0x4a0ac45b:10899618 -- Wed May 13 22:30:11 2009
Size of extra inode fields: 28
Extended attributes stored in inode body: 
   = "01 00 00 00 01 00 06 00 02 00 07 00 12 27 00 00 04 00 05 00 08 00 05 00 57 27 00 00 08 00 07 00 98 27 00 00 10 00 06 00 20 00 00 00 " (44)
  DOSATTRIB = "0x20" (4)
BLOCKS:
(IND):1081467, (0-24575):999424-1023999, (24576-26623):1028096-1030143, (26624-43007):1032192-1048575, (43008-45055):1087488-1089535, (45056-51199):1550336-1556479, (51200-55295):1560576-1564671, (55296-63487):1540096-1548287, (63488-65535):2985984-2988031, (65536-92159):2996224-3022847, (92160-100351):3024896-3033087, (100352-112639):3035136-3047423, (112640-155647):3049472-3092479, (155648-157695):3100672-3102719, (157696-165887):3092480-3100671, (165888-206847):3104768-3145727, (206848-208895):3186688-3188735, (208896-681983):3196928-3670015, (681984-1042431):3735552-4095999, (1042432-1048575):4155392-4161535, (1048576-1050623):4182016-4184063, (1050624-1052671):4186112-4188159, (1052672-1056767):4177920-4182015, (1056768-1060863):4190208-4194303, (1060864-1124351):4655104-4718591, (1124352-1126399):5390336-5392383, (1126400-1142783):5423104-5439487, (1142784-1155071):5478400-5490687, (1155072-1159167):5472256-5476351, (1159168-1433599):5492736-5767167, (1433600-1437695):6057984-6062079, (1437696-1634303):6094848-6291455, (1634304-1636351):6432768-6434815, (1636352-1640447):6483968-6488063, (1640448-1646591):6498304-6504447, (1646592-1662975):6510592-6526975, (1662976-1718271):6533120-6588415, (1718272-1943551):6590464-6815743, (1943552-2203647):6914048-7174143, (2203648-2205695):7182336-7184383, (2205696-2207743):7192576-7194623, (2207744-2211839):7188480-7192575, (2211840-2355199):7196672-7340031, (2355200-2357247):7380992-7383039, (2357248-2359295):7387136-7389183, (2359296-2463743):7444480-7548927, (2463744-2531327):7567360-7634943, (2531328-2746367):7649280-7864319, (2746368-2811903):7897088-7962623, (2811904-2813951):7979008-7981055, (2813952-2869247):7995392-8050687, (2869248-2879487):8052736-8062975, (2879488-2881535):8081408-8083455, (2881536-2897919):8065024-8081407, (2897920-2906111):8085504-8093695, (2906112-2908159):8110080-8112127, (2908160-2910207):8114176-8116223, (2910208-2920447):8120320-8130559, (2920448-3026943):8136704-8243199, (3026944-3028991):8249344-8251391, (3028992-3031039):8253440-8255487, (3031040-3035135):8245248-8249343, (3035136-3037183):8292352-8294399, (3037184-3041279):8298496-8302591, (3041280-3043327):8398848-8400895, (3043328-3045375):8402944-8404991, (3045376-3049471):8407040-8411135, (3049472-3055615):8417280-8423423, (3055616-3094527):8501248-8540159, (3094528-3096575):8710144-8712191, (3096576-3110911):8726528-8740863, (3110912-3112959):8779776-8781823, (3112960-3115007):8806400-8808447, (3115008-3119103):8790016-8794111, (3119104-3162111):8818688-8861695, (3162112-3164159):8863744-8865791, (3164160-3166207):8923136-8925183, (3166208-3235839):11603968-11673599, (3235840-3276799):12998656-13039615, (3276800-3278847):13070336-13072383, (3278848-3282943):13058048-13062143, (3282944-3287039):13066240-13070335, (3287040-3303423):13041664-13058047, (3303424-3307519):16058368-16062463, (3307520-3311615):16084992-16089087, (3311616-3334143):16119808-16142335, (3334144-3336191):16150528-16152575, (3336192-3340287):16146432-16150527, (3340288-3342335):16154624-16156671, (3342336-3348479):16160768-16166911, (3348480-3350527):16199680-16201727, (3350528-3352575):16203776-16205823, (3352576-3354623):16226304-16228351, (3354624-3356671):16359424-16361471, (3356672-3360767):16369664-16373759, (3360768-3362815):16377856-16379903, (3362816-3364863):16388096-16

> debugfs: ncheck 759

debugfs:  ncheck 759
Inode   Pathname
759     /local/dumps/exchange/exchange-2000-UCWB-KVM-18.bkfB-KVM-18.bkf

By the way, that filename on disk is just
"exchange-2000-UCWB-KVM-18.bkf". Not sure if the way it appears to
repeat itself at the end there is indicative of a problem.

> Also, could you try run e2fsck on the filesystem and send me the
> output?  The ext4_error() should have stopped the kernel from doing
> any damage (like stomping on block 529, which was part of the inode
> table or block group descriptors).  If send me the dumpe2fs output
> about group 0, we'll know for sure (i.e.):
> 
> Group 0: (Blocks 0-32767) [ITABLE_ZEROED]
>   Checksum 0x3e19, unused inodes 8169
>   Primary superblock at 0, Group descriptors at 1-5
>   Reserved GDT blocks at 6-32
>   Block bitmap at 33 (+33), Inode bitmap at 49 (+49)
>   Inode table at 65-576 (+65)
> 
> (Your numbers will be different; this was from a 80GB hard drive).

hermes:~# e2fsck -pfv /dev/dm-0
/dev/dm-0: recovering journal

  331810 inodes used (0.49%)
   16080 non-contiguous inodes (4.8%)
         # of inodes with ind/dind/tind blocks: 0/0/0
         Extent depth histogram: 331411/329
46723278 blocks used (17.41%)
       0 bad blocks
       3 large files

  305051 regular files
   26686 directories
       0 character device files
       0 block device files
       0 fifos
       0 links
      64 symbolic links (60 fast symbolic links)
       0 sockets
--------
  331801 files

hermes:~# dumpe2fs /dev/dm-0
...
Group 0: (Blocks 0-32767) [ITABLE_ZEROED]
  Checksum 0x0eb9, unused inodes 0
  Primary superblock at 0, Group descriptors at 1-64
  Reserved GDT blocks at 65-256
  Block bitmap at 257 (+257), Inode bitmap at 273 (+273)
  Inode table at 289-800 (+289)
  3443 free blocks, 0 free inodes, 571 directories
  Free blocks: 9067, 9108-9111, 9116-9119, 9276-9279, 9308-9311, 9687, 9720-9727, 9736-9743, 9848-9879, 9888-9895, 9936-9951, 10032-10047, 10072-10079, 10200-10207, 10352-10367, 10584-10591, 10619-10623, 10661-10671, 10740-10751, 11476-11487, 11512-11519, 11637-11647, 11669-11679, 11711, 11772-11775, 11890-11903, 11963-11967, 12187-12191, 12213-12223, 12249-12255, 12272-12280, 12282-12287, 12322-12351, 12381-12383, 12405-12415, 12437-12447, 12478-12479, 12542-12543, 12664-12671, 12727-12735, 12766-12767, 12851-12863, 12890-12895, 12927, 12970, 12980-12991, 13047-13055, 13246-13247, 13290-13295, 13312-13317, 13406-13407, 13433, 13439, 13456-13463, 13490-13503, 13537-13567, 13658-13695, 13770-13823, 13850-13855, 13886-13887, 13908-13919, 13951, 13971-13983, 14010-14015, 14056-14079, 14204-14207, 14249-14271, 14316-14350, 14358-14361, 14375, 14487-14495, 14526-14527, 14550-14584, 14605, 14619, 14630-14635, 14647-14655, 14660, 14674-14687, 14705-14719, 14744-14774, 14785-14815, 14844-14847, 14889-14926, 14933-14944, 14948, 14965-14975, 14992-15000, 15006-15007, 15035-15039, 15069-15087, 15097-15102, 15114-15127, 15141-15156, 15158-15164, 15176-15184, 15192-15193, 15213, 15223-15230, 15238, 15241-15256, 15264-15281, 15289-15297, 15305-15325, 15333-15341, 15350-15359, 15427-15455, 15482-15487, 15599-15615, 15705-15711, 15732-15743, 15798-15807, 15860-15871, 15923-15935, 15956-15967, 15989-15999, 16040-16063, 16090-16095, 16123-16127, 16171-16191, 16219-16223, 16246-16255, 16309-16319, 16343-16351, 16379-16383, 16565-16575, 16624-16639, 16688-16703, 16721-16735, 16763-16767, 16837-16863, 16881-16895, 17036-17055, 17075-17087, 17147-17151, 17220-17247, 17267-17279, 17307-17311, 17339-17343, 17365-17375, 17403-17407, 17453-17471, 17531-17535, 17554-17567, 17599, 17647-17663, 17718-17727, 17756-17759, 17812-17823, 17844-17855, 17904-17919, 18103-18111, 18131-18143, 18162-18175, 18277-18303, 18331-18335, 18363-18367, 18393-18399, 18427-18431, 18598-18623, 18676-18687, 18706-18719, 18738-18751, 18805-18815, 18844-18847, 18876-18879, 18908-18911, 18933-18943, 19135, 19198-19199, 19249-19263, 19286-19295, 19323-19327, 19416-19423, 19455, 19527-19551, 19582-19583, 19704-19711, 19756-19775, 19835-19839, 19887-19903, 19925-19935, 19959-19967, 20148-20159, 20206-20223, 20271-20287, 20336-20351, 20422-20447, 20476-20479, 20671, 20727-20735, 20758-20767, 20795-20799, 20818-20831, 20849-20863, 20891-20895, 20918-20927, 20955-20959, 20980-20991, 21133-21151, 21172-21183, 21221-21247, 21265-21279, 21307-21311, 21345-21375, 21425-21439, 21476-21503, 21531-21553, 21560-21596, 21604, 21611, 21630-21631, 21662-21666, 21672-21673, 21683, 21695-21725, 21732-21755, 21765-21782, 21795-21842, 21848-21875, 21890-21919, 21947-21951, 22005-22030, 22038-22052, 22062-22076, 22091-22092, 22107-22118, 22131-22140, 22154-22183, 22196-22198, 22569-22591, 22619-22623, 22651-22655, 22760-22783, 22803-22815, 22847, 22888-22911, 23002-23007, 23035-23039, 23153-23167, 23189-23199, 23260-23263, 23284-23295, 23336-23359, 23402-23423, 23462-23487, 23512-23519, 23544-23551, 23572-23583, 23644-23647, 23673-23679, 23768-23804, 23806-23807, 23841-23871, 23930-23935, 24011-24031, 24053-24076, 24103-24127, 24186-24191, 24287, 24347-24351, 24371-24383, 24434-24447, 24475-24479, 24501-24511, 24566-24575, 26431, 26472-26495, 26562-26591, 26619-26623, 26903-26905, 26909-26911, 26932-26943, 26984-27007, 27134-27135, 27182-27199, 27238-27263, 27291-27295, 27323-27327, 27376-27391, 27507-27519, 27641-27647, 27979-27999, 28024-28031, 28103-28127, 28255, 28323-28351, 28373-28383, 28414-28415, 28572-28575, 28598-28607, 28668-28671
  Free inodes: 

> > Any clues there? I don't think I'll be able to run this during the day
> > if it's going to trigger and remount the fs read-only as easily as
> > this.
> 
> Yes, this is a *huge*.  Finding this may have been just what we needed
> to determine what has been going on.  Thank you!!
> 
> What I would suggest doing is to mount the filesystem with the mount
> option -o nodelalloc.  This would be useful for two fronts: (1) it
> would determine whether or not the problem shows up if we suppress
> delayed allocation, and (2) if it works, hopefully it will prevent you
> from further filesystem corruption incidents.  (At least as a
> workaround; turning off delayed allocation will result in a
> performance decrease, but better that then data loss, I always say!)

Ok, thanks - I've added the mount option. I'll let you know if the
error still triggers.

> The fact that this triggered so quickly is actually a bit scary, so
> hopefully we'll be able to track down exactly what happened fairly
> quickly from here, and then fix the problem for real.

Yeah, that would be great.

Cheers,
Kevin.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux