Re: hfsplus BUG(), kmap and journalling.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





Vyacheslav Dubeyko wrote:
On Sat, 2012-10-20 at 07:24 +0100, Hin-Tak Leung wrote:
--- On Fri, 19/10/12, Vyacheslav Dubeyko <slava@xxxxxxxxxxx> wrote:

Hi Hin-Tak,

On Thu, 2012-10-18 at 17:55 +0100, Hin-Tak Leung wrote:
Hi,

While looking at a few of the older BUG() traces I have
consistently
running du on a somewhat large directory with lots of
small files and
small directories, I noticed that it tends to have two
sleeping "?
hfs_bnode_read()" towards the top. As it is a very
small and simple
function which just reads a b-tree node record -
sometimes only a few
bytes between a kmap/kunmap, I see that it might just
be the number of
simultaneous kmap() being run. So I put a mutex around
it just to make
sure only one copy of hfs_bnode_read() is run at a
time.

Yeah, you touch very important problem. It needs to rework
hfsplus
driver from using kmap()/kunmap() because kmap() is slow,
theoretically
deadlocky and is deprecated. The alternative is
kunmap_atomic() but it
needs to dive more deeply in every case of kmap() using in
hfsplus
driver.

The mutex is useless. It simply hides the issue.

Yes, I am aware of that - putting mutex'es in just makes fewer kmap calls, but the limit of simultaneous kmap()'s can still be reached - and reasonably easily - just run 'du' a few more times, as I wrote below.


Usually, hfs_bnode_read() is called after searching of some object in b-tree.
It needs to initialize struct hfsplus_find_data by means of hfs_find_init()
before any search and operations inside b-tree node. And then, it needs to
call hfs_find_exit(). The hfsplus_find_data structure contains mutex that it
locks of b-tree during hfs_find_init() call and unlock during
hfs_find_exit(). And, usually, hfs_bnode_read() is placed
between hfs_find_init()/hfs_find_exit() calls. So, as I can understand, your
mutex inside hfs_bnode_read() is useless. But, maybe, not all hfs_bnode_read()
calls are placed inside hfs_find_init()/hfs_find_exit() calls. It needs to check.

I can't clear understand about what simultaneous kmap()'s you are talking.
As
I can see, usually, (especially, in the case of hfs_bnode_read()) pairs of
kmap()/kunmap() localize in small parts of code and I expect that it executes
fast. Do I misunderstand something?

Perhaps it is easier to show some code for the discussion. The attached serializes kmap in hfs_bnode_read() . This makes the driver works more reliably, somehow. In real usage, multiple instances of hfs_bnode_read() do get invoked in parallel. I assume that is because of both SMP as well as file system access itself are parallelized at various level - e.g. recursion like running du probably invokes a few instances of readdir/getdents?

I tried swapping those by kmap_atomic()/kunmap_atomic() (beware the arguments are different for unmap) - but the kernel immediately warned that *_atom() code is used where code can sleep.


Could you describe in more details about warning?

In hfs_bnode_read(), I tried changing kmap/kunmap to kmap_atomic()/kunmap_atomic() . (Sorry I deleted the change since it does not work - but it shouldn't be too difficult to redo since it is just changing two lines in hfs_bnode_read()) - then I get many:

BUG: scheduling while atomic: ...

<a lot of stuff snipped>

Netgear's hfs+ journalling code do try to replay journals, etc. But (1) it does not *create* journal entry correctly for attributes files, since implementation of attributes file itself was incomplete until recently, (2) I suspect it does not create/update journals the *exact same* way as Mac OS X - this means it is unsafe to do cross-OS unclean mounts, even when unclean-mount works perfectly by itself under Linux.

BTW, I think I see another bug - unrelated to journalling - in the current hfs+ code: folder counts are not updated correctly. It seems that hfs+ folders have recursive folder counts (i.e. a parent dir knows how many sub-dir's it has, without needing to recurse down), and this is currently not updated correctly. One way of demo'ing this is to:

   (make sure fs is fsck-clean ; mount)

   cp -ipr somedir_with_some_files/ some_hfs+_place/someplace_inside/

   (umount; run fsck_hfs again, now it complains about folder counts)

The actual message looks a it like this:

** Checking catalog hierarchy.
   HasFolderCount flag needs to be set (id = 393055)
   (It should be 0x10 instead of 0)
   Incorrect folder count in a directory (id = 205)
   (It should be 18 instead of 17)
>From 3b69f7f0b2827d00daf20aa75991e8d184be5ad0 Mon Sep 17 00:00:00 2001
From: Hin-Tak Leung <htl10@xxxxxxxxxxxxxxxxxxxxx>
Date: Thu, 11 Oct 2012 01:46:49 +0100
Subject: [PATCH] serialize hfs_bnode_read_kmap


Signed-off-by: Hin-Tak Leung <htl10@xxxxxxxxxxxxxxxxxxxxx>
---
 fs/hfsplus/bnode.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
index 5c125ce..7c19ad9 100644
--- a/fs/hfsplus/bnode.c
+++ b/fs/hfsplus/bnode.c
@@ -17,6 +17,20 @@
 #include "hfsplus_fs.h"
 #include "hfsplus_raw.h"
 
+static DEFINE_MUTEX(hfs_bnode_read_kmap_mutex_mutex);
+
+static inline void
+hfs_bnode_read_kmap_mutex_lock(void)
+{
+	mutex_lock(&hfs_bnode_read_kmap_mutex_mutex);
+}
+
+static inline void
+hfs_bnode_read_kmap_mutex_unlock(void)
+{
+	mutex_unlock(&hfs_bnode_read_kmap_mutex_mutex);
+}
+
 /* Copy a specified range of bytes from the raw data of a node */
 void hfs_bnode_read(struct hfs_bnode *node, void *buf, int off, int len)
 {
@@ -28,14 +42,18 @@ void hfs_bnode_read(struct hfs_bnode *node, void *buf, int off, int len)
 	off &= ~PAGE_CACHE_MASK;
 
 	l = min(len, (int)PAGE_CACHE_SIZE - off);
+	hfs_bnode_read_kmap_mutex_lock();
 	memcpy(buf, kmap(*pagep) + off, l);
 	kunmap(*pagep);
+	hfs_bnode_read_kmap_mutex_unlock();
 
 	while ((len -= l) != 0) {
 		buf += l;
 		l = min(len, (int)PAGE_CACHE_SIZE);
+		hfs_bnode_read_kmap_mutex_lock();
 		memcpy(buf, kmap(*++pagep), l);
 		kunmap(*pagep);
+		hfs_bnode_read_kmap_mutex_unlock();
 	}
 }
 
-- 
1.7.11.7


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux