--- On Sun, 7/4/13, Vyacheslav Dubeyko <slava@xxxxxxxxxxx> wrote: > Hi Hin-Tak, > > On Apr 5, 2013, at 12:57 AM, Hin-Tak Leung wrote: > > > Hi Michael, > > > > Argh, that looks suspiciously like the recurring > problem I have been trying to pin down for the much of the > last year. My current thinking is that one of the patches > posted a couple of weeks ago might help. > > As I remember, you can easily reproduce the issue that you > are investigating. Does the issue reproducible with enabled > debug output? Can you reproduce the issue with fully enabled > debug output (I mean to enable all debug flags)? If you can > reproduce the issue with enabled debug output then could you > share this debug output with me? That's correct - I can trigger the error condition with debug enabled quite reasonably "reliably". I remembered having done that once, I think with catalog and extent debugging on. The problem was that it generated too much information; since I needed to run "du" on a large directory (~million files) to trigger the condition, the catalog debugging info is a few lines per file, and "du" gets at every of the ~million files, so we are talking about dumping a few hundred MBs into /var/log/messages :-(. Hence another reason to switching to dynamic debugging also - so that one can switch on/off per debugging lines. Even that is not ideal. > Thanks, > Vyacheslav Dubeyko. > > > That patch addresses out-of-memory conditions in > caching of metadata, in a nutshell. I think if (1) the > system is under memory stress, (2) one is doing something > which transverse the file system very quickly, (3) on a > mult-CPU/core system, it is possible to run some mutexed > non-re-entrant code in the hfsplus simultaneously without a > mutex lock, and therefore get it a bit confused. This idea > at least explains why (1) adding an inner mutex lock can > delay the problem although supposedly the outer mutex should > have prevented more than one copy of the non-re-entrant code > from being run and the inner mutex lock should have no > effect at all, (2) the on-disk data is always fsck'ed okay - > it is just the driver itself getting confused. > > > > So I have a few questions for you: > > > > 1. You are on a quad-core system, correct? This is > according to your /proc/cpuinfo below. > > > > 2. You are certainly doing fast file system transversal > (updatedb), but are you actually doing it *on top of the > hfsplus* file system? I am asking this because updatedb is > usually configured not the index removable media under /mnt > or /media . But you mentioned you have the hfsplus system > mounted under /home - please confirm that and include some > more details if you can. > > > > 3. How full and populous is that hfs+ file system? i.e. > the output of both "df" and "df -i" while it is mounted. Is > this your Mac OS X system (root / ) disk? > > > > 4. Is your system under memory stress at the moment the > problem happens - e.g. you have a web browser with a few > hundred tabs open? > > > > Hin-Tak > > > > --- On Thu, 4/4/13, Vyacheslav Dubeyko <slava@xxxxxxxxxxx> > wrote: > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html