Hi Hin-Tak, On Thu, 2012-10-18 at 17:55 +0100, Hin-Tak Leung wrote: > Hi, > > While looking at a few of the older BUG() traces I have consistently > running du on a somewhat large directory with lots of small files and > small directories, I noticed that it tends to have two sleeping "? > hfs_bnode_read()" towards the top. As it is a very small and simple > function which just reads a b-tree node record - sometimes only a few > bytes between a kmap/kunmap, I see that it might just be the number of > simultaneous kmap() being run. So I put a mutex around it just to make > sure only one copy of hfs_bnode_read() is run at a time. Yeah, you touch very important problem. It needs to rework hfsplus driver from using kmap()/kunmap() because kmap() is slow, theoretically deadlocky and is deprecated. The alternative is kunmap_atomic() but it needs to dive more deeply in every case of kmap() using in hfsplus driver. The mutex is useless. It simply hides the issue. > This seems to make it much harder to get a BUG() - I needed to run du > a few times over and over to get it again. Of course it might just be > a mutex slowing the driver down to make it less likely to get > confused, but as I read that the number of simultaneous kmap() in the > kernel is limited, I think I might be on to something. > Also this shifts the problem onto multiple copies of "? > hfsplus_bmap()". (which also kmap()/kunmap()'s, but much more > complicated). Namely, the mutex hides the issue. > I thought of doing hfsplus_kmap()/etc(which seems to exist a long time > ago but removed!) , but this might cause dead locks since some of the > hfsplus code is kmapping/kunmapping all the time, and recursively. So > a better way might be just to make sure only one instance of some of > the routines are only run one at a time. i.e. multiple mutexes. > This is both ugly and sounds like voodoo though. Also I am not sure > why the existing mutex'es, which protects some of the internal > structures, doesn't protect against too many kmap's. (maybe they > protect "writes", but not against too many simultaneous reads). > So does anybody has an idea how many kmaps are allowed and how to tell > that I am close to my machine's limit? As I can understand, the hfsplus_kmap() doesn't do something useful. It really needs to rework kmap()/kunmap() using instead of mutex using. Could you try to fix this issue? :-) > Also a side note on the Netgear journalling code: I see that it > jounrnals the volume header, some of the special files (the catalog, > allocation bitmap, etc), but (1) it has some code to journal the > attribute file, but it was actually non-functional, since without > Vyacheslav's recent patches, the linux kernel doesn't even read/write > that correctly, let alone doing *journalled* read/write correctly, (2) > there is a part which tries to do data-page journalling, but it seems > to be wrong - or at least, not quite working. (this I found while I > was looking at some curious warning messages and how they come about). > Luckily that codes just bails out when it gets confused - i.e. it does > non-journalled writes, rather than writing wrong journal to disk. So > it doesn't harm data under routine normal use. (i.e. mount/unmount > cleanly). > But that got me worrying a bit about inter-operability: it is probably > unsafe to use Linux to replay the journal written by Mac OS X, and > vice versa. i.e. if you have a dual boot machine, or a portable disk > that you use between two OSes, if it disconnects/unplugs/crashes under > one OS, it is better to plug it right back and let the same OS > replaying the journal then unmount cleanly before using it under the > other OS. The journal should be replayed during every mount in the case of presence of valid transactions. A HFS+ volume shouldn't be mounted without journal replaying. Otherwise, it is possible to achieve corrupted partition. Just imagine, you have mounted HFS+ partition with not empty journal then add some data on volume. It means that you modify metadata. If you will mount such HFS+ volume under Mac OS X then journal will be replayed and metadata will be corrupted. With the best regards, Vyacheslav Dubeyko. > I'll be interested on hearing any tips on finding out kmap's limit at > run time, if anybody has any idea... > > Hin-Tak -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html