On Apr 7, 2013, at 5:41 PM, Hin-Tak Leung wrote: > --- On Sun, 7/4/13, Vyacheslav Dubeyko <slava@xxxxxxxxxxx> wrote: > >> Hi Hin-Tak, >> >> On Apr 5, 2013, at 12:57 AM, Hin-Tak Leung wrote: >> >>> Hi Michael, >>> >>> Argh, that looks suspiciously like the recurring >> problem I have been trying to pin down for the much of the >> last year. My current thinking is that one of the patches >> posted a couple of weeks ago might help. >> >> As I remember, you can easily reproduce the issue that you >> are investigating. Does the issue reproducible with enabled >> debug output? Can you reproduce the issue with fully enabled >> debug output (I mean to enable all debug flags)? If you can >> reproduce the issue with enabled debug output then could you >> share this debug output with me? > > That's correct - I can trigger the error condition with debug enabled quite reasonably "reliably". I remembered having done that once, I think with catalog and extent debugging on. The problem was that it generated too much information; since I needed to run "du" on a large directory (~million files) to trigger the condition, the catalog debugging info is a few lines per file, and "du" gets at every of the ~million files, so we are talking about dumping a few hundred MBs into /var/log/messages :-(. > Yes, I understand that the debug output can be a huge in size. But we need to to take into consideration only the output near the point of issue's occurrence. So, I think that it is possible to reduce debug output into smaller size. I understand that such filtering can be not so easy task. But we need to localize the issue's reason anyway by means of debug output analyzing. With the best regards, Vyacheslav Dubeyko. > Hence another reason to switching to dynamic debugging also - so that one can switch on/off per debugging lines. Even that is not ideal. > >> Thanks, >> Vyacheslav Dubeyko. >> >>> That patch addresses out-of-memory conditions in >> caching of metadata, in a nutshell. I think if (1) the >> system is under memory stress, (2) one is doing something >> which transverse the file system very quickly, (3) on a >> mult-CPU/core system, it is possible to run some mutexed >> non-re-entrant code in the hfsplus simultaneously without a >> mutex lock, and therefore get it a bit confused. This idea >> at least explains why (1) adding an inner mutex lock can >> delay the problem although supposedly the outer mutex should >> have prevented more than one copy of the non-re-entrant code >> from being run and the inner mutex lock should have no >> effect at all, (2) the on-disk data is always fsck'ed okay - >> it is just the driver itself getting confused. >>> >>> So I have a few questions for you: >>> >>> 1. You are on a quad-core system, correct? This is >> according to your /proc/cpuinfo below. >>> >>> 2. You are certainly doing fast file system transversal >> (updatedb), but are you actually doing it *on top of the >> hfsplus* file system? I am asking this because updatedb is >> usually configured not the index removable media under /mnt >> or /media . But you mentioned you have the hfsplus system >> mounted under /home - please confirm that and include some >> more details if you can. >>> >>> 3. How full and populous is that hfs+ file system? i.e. >> the output of both "df" and "df -i" while it is mounted. Is >> this your Mac OS X system (root / ) disk? >>> >>> 4. Is your system under memory stress at the moment the >> problem happens - e.g. you have a web browser with a few >> hundred tabs open? >>> >>> Hin-Tak >>> >>> --- On Thu, 4/4/13, Vyacheslav Dubeyko <slava@xxxxxxxxxxx> >> wrote: >>> >> >> > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html