Re: hfsplus BUG: Bad page state in process du pfn:07759 (Re: hfsplus corruption, failed fsck, journalling and zero'ing extent record on delete)

Hin-Tak Leung <hintak_leung@xxxxxxxxxxx> · Mon, 24 Sep 2012 18:43:20 +0100 (BST)

Hi Vyacheslav,

1. Up-to-date fedora, 3.5.4-1.fc17.x86_64, unmodified.
2. The genfs_contexts message you comes from selinux. It always happens after mounting removable storage media of any fs type (fat, ntfs, ext4 - I have a few USB drives). The hardware is a 160 GB SATA 2.5 hard disc in an external USB enclosure. 
3. The directory I can regularly get this sort of messages is a folder containing all 6 of ftp://downloads.netgear.com/files/GPL/WNDRMAC*tar.bz2.zip - 
(netgear's source code for their HFS+-capable network storage appliances), unzip'ed themselves, as well as the further un-tar.bz2 contents. So it have 6 expanded kernel trees as well as the source codes/git repositories of samba, etc other software that constitutes the product. 

There is possibly one important detail - 103022 to 111015 is 7000 seconds, or almost two hours. Between mounting (read-only, the default for journalled HFS+), I was doing something else. I think it tends to happen when I leave the drive connected but don't do anything on it for some time, and the message seems to be about some of the file system's structure being paged out? It is just running "du" on one terminal while doing dmesg on another.

This is the first time I see this on a completely fsck-clean fs. (fsck.hfsplus -f a few times). It had a somewhat complicated history. Previously it was created as hfs+ case-sensitive journalled, and written to with the experimental journalling code, and it went corrupted after some read/write experiments with quick-deletes some weeks ago. I learned enough about hfs+, found the fragments and the sectors of the catalog b-tree, dd them out, hacked them with a hex-editor and dd them back in, to help fsck a bit... and then fsck a few times and passed, before mounting. I have seen this sort of messages quite a few times, but so far I have dismissed them since any bad messages are expected mounting corrupted fs (read-only), and the fs was corrupted until yesterday. (the drive also have two ntfs partitions, and I use them fairly regularly, so udev/etc automount the hfs+ one read-only often, and I looked around often enough).

I can probably go back and see the earlier BUG messages, to see how long between mounting and du/BUG() - I think every time I see it the drive has been idling for some time. I have collected a few such messages (but as I said, any scary messages are expected for corrupted fs; this latest one is on a clean one though which has have "fsck -f" a few times recently before mounting).

Regards,
Hin-Tak

--- On Mon, 24/9/12, Vyacheslav Dubeyko <slava@xxxxxxxxxxx> wrote:

> Hi Hin-Tak,
> 
> Could you describe the way of issue reproduction in more
> details?
> 
> I need to know:
> 1. What kernel version do you use?
> 2. Do you have some special configuration of the system
> ([103022.536649]
> SELinux: initialized (dev sdb5, type hfsplus), uses
> genfs_contexts)?
> 3. How did you generate small files? How small is it (I mean
> size)?
> 
> Sorry, but currently I haven't clear understanding how to
> reproduce this
> issue from your description.
> 
> With the best regards,
> Vyacheslav Dubeyko.
> 
> On Mon, 2012-09-24 at 08:30 +0100, Hin-Tak Leung wrote:
> > Argh, the BUG() seems to be a genuine bug - running du
> on the recently "fsck.hfsplus -f" clean disk, mounting
> read-only with unmod'ed distro hfsplus driver: (see, "not
> Tainted"...)
> > 
> > =================
> > [103022.493765] hfs: write access to a journaled
> filesystem is not supported, use the force option at your
> own risk, mounting read-only.
> > [103022.536649] SELinux: initialized (dev sdb5, type
> hfsplus), uses genfs_contexts
> > [111015.478171] BUG: Bad page state in process du 
> pfn:07759
> > [111015.478182] page:ffffea00001dd640 count:0
> mapcount:0 mapping:          (null)
> index:0x1935
> > [111015.478185] page flags:
> 0x20000000000004(referenced)
> > [111015.478189] Modules linked in: usb_storage tcp_lp
> nls_utf8 hfsplus fuse ip6table_filter ip6_tables ebtable_nat
> ebtables ipt_MASQUERADE 
> > iptable_nat nf_nat xt_CHECKSUM iptable_mangle bridge
> stp llc xt_LOG xt_physdev nf_conntrack_ipv4 nf_defrag_ipv4
> xt_state nf_conntrack arc4 
> > rtl8187 eeprom_93cx6 mac80211 cfg80211
> snd_hda_codec_realtek joydev vhost_net tun macvtap macvlan
> kvm_amd kvm edac_core edac_mce_amd k8temp
> >  r592 memstick sp5100_tco snd_hda_intel
> snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_timer snd
> soundcore i2c_piix4 r8169 mii shpchp t
> > oshiba_acpi sparse_keymap rfkill wmi ecryptfs
> sha256_generic encrypted_keys nfsd nfs_acl auth_rpcgss lockd
> sunrpc uinput binfmt_misc truste
> > d tpm tpm_bios ata_generic pata_acpi firewire_ohci
> sdhci_pci sdhci firewire_core crc_itu_t mmc_core pata_atiixp
> video radeon i2c_algo_bit d
> > rm_kms_helper ttm drm i2c_core [last unloaded:
> scsi_wait_scan]
> > [111015.478274] Pid: 23364, comm: du Not tainted
> 3.5.4-1.fc17.x86_64 #1
> > [111015.478277] Call Trace:
> > [111015.478291]  [<ffffffff81604213>]
> bad_page+0xe6/0xfb
> > [111015.478299]  [<ffffffff8112dd8e>]
> get_page_from_freelist+0x77e/0x940
> > [111015.478305]  [<ffffffff8112e0fd>]
> __alloc_pages_nodemask+0x1ad/0x970
> > [111015.478318]  [<ffffffffa05e5719>] ?
> hfsplus_bnode_read+0x89/0x100 [hfsplus]
> > [111015.478324]  [<ffffffffa05e5775>] ?
> hfsplus_bnode_read+0xe5/0x100 [hfsplus]
> > [111015.478329]  [<ffffffff811699e0>]
> alloc_pages_current+0xb0/0x120
> > [111015.478334]  [<ffffffff811721b8>]
> new_slab+0x268/0x320
> > [111015.478339]  [<ffffffff8160546e>]
> __slab_alloc+0x36e/0x4c8
> > [111015.478344]  [<ffffffffa05df11a>] ?
> hfsplus_alloc_inode+0x1a/0x40 [hfsplus]
> > [111015.478349]  [<ffffffffa05df11a>] ?
> hfsplus_alloc_inode+0x1a/0x40 [hfsplus]
> > [111015.478354]  [<ffffffff811733f8>]
> kmem_cache_alloc+0x108/0x160
> > [111015.478359]  [<ffffffffa05e7d40>] ?
> __hplusfs_brec_find+0xa0/0x180 [hfsplus]
> > [111015.478364]  [<ffffffffa05df11a>]
> hfsplus_alloc_inode+0x1a/0x40 [hfsplus]
> > [111015.478371]  [<ffffffff811a0606>]
> alloc_inode+0x26/0xa0
> > [111015.478375]  [<ffffffff811a1c78>]
> iget_locked+0xb8/0x190
> > [111015.478380]  [<ffffffffa05df715>]
> hfsplus_iget+0x15/0x230 [hfsplus]
> > [111015.478386]  [<ffffffffa05e7c8f>] ?
> hfsplus_find_exit+0x2f/0x40 [hfsplus]
> > [111015.478391]  [<ffffffffa05e467f>]
> hfsplus_lookup+0x20f/0x2d0 [hfsplus]
> > [111015.478397]  [<ffffffff8119ed84>] ?
> __d_alloc+0x34/0x180
> > [111015.478402]  [<ffffffffa05d701a>] ?
> char2uni+0x1a/0x50 [nls_utf8]
> > [111015.478406]  [<ffffffff81173321>] ?
> kmem_cache_alloc+0x31/0x160
> > [111015.478410]  [<ffffffff8119ed84>] ?
> __d_alloc+0x34/0x180
> > [111015.478413]  [<ffffffff8119ee9c>] ?
> __d_alloc+0x14c/0x180
> > [111015.478419]  [<ffffffff811928e1>]
> __lookup_hash+0x61/0x120
> > [111015.478423]  [<ffffffff81194b49>] ?
> lookup_fast+0x219/0x310
> > [111015.478427]  [<ffffffff81605959>]
> lookup_slow+0x47/0xab
> > [111015.478431]  [<ffffffff81196ac6>]
> path_lookupat+0x716/0x750
> > [111015.478436]  [<ffffffff81173321>] ?
> kmem_cache_alloc+0x31/0x160
> > [111015.478440]  [<ffffffff81196b31>]
> do_path_lookup+0x31/0xc0
> > [111015.478444]  [<ffffffff81192b33>] ?
> getname_flags+0x53/0xf0
> > [111015.478448]  [<ffffffff8119787d>]
> user_path_at_empty+0x5d/0xa0
> > [111015.478454]  [<ffffffff8127973a>] ?
> inode_has_perm.isra.31.constprop.61+0x2a/0x30
> > [111015.478459]  [<ffffffff8127d835>] ?
> selinux_inode_getattr+0x45/0x50
> > [111015.478464]  [<ffffffff8118c910>] ?
> cp_new_stat+0x120/0x140
> > [111015.478468]  [<ffffffff811978d1>]
> user_path_at+0x11/0x20
> > [111015.478472]  [<ffffffff8118cba5>]
> vfs_fstatat+0x35/0x70
> > [111015.478475]  [<ffffffff8118ceaa>]
> sys_newfstatat+0x1a/0x40
> > [111015.478482]  [<ffffffff81614e29>]
> system_call_fastpath+0x16/0x1b
> > [111015.478485] Disabling lock debugging due to kernel
> taint
> > ============================
> > 
> > --- On Mon, 24/9/12, Hin-Tak Leung <htl10@xxxxxxxxxxxxxxxxxxxxx>
> wrote:
> > 
> > <snipped>
> > > I mentioned briefly some days ago that I managed
> to corrupt
> > > an HFS+ paritition while experimenting with the
> journalling
> > > code, to the extent that fsck_hfs/fsck.hfsplus
> (Apple's
> > > diskdev_cmds tool) refuses to fix. And that
> partition, with
> > > the unmodified module used ready-only can get the
> kernel to
> > > BUG() "reliably" by just doing "du" on it (and I
> was
> > > thinking whether BUG()'ing on corrupted disk is a
> bug to
> > > file...).
> > 
> > 
> > --
> > To unsubscribe from this list: send the line
> "unsubscribe linux-fsdevel" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html