Re: [WISH] Store also tag dereferences in packed-refs

"Marco Costalba" <mcostalba@xxxxxxxxx> · Sun, 19 Nov 2006 20:07:07 +0100

On 11/19/06, Linus Torvalds <torvalds@xxxxxxxx> wrote:

So it all looks _almost_ fine..

Except for this one:

   10:19:04.449236 stat64(".git/objects/3a/41a48d139d1425c1d27e3fbe4f511fb7e09e94", {st_mode=S_IFREG|0444, st_size=278, ...}) = 0 <0.817989>

That's a _single_ "stat64()" system call that takes almost a second to
execute. All the rest are in the millisecond range, and sometimes a
hundreth of a second or two. Ie doing

        grep -v ' <0.0[012]' tracefile_git_tree.txt

on your tracefile, there's really not a lot of system calls that take a
long time, and that one stat _really_ stands out (the others are 3 or four
hundredths of a second, and then suddenly you have one that is 20 times
longer than even the slowest other ones.

Basically, you seem to have a _single_ object access that takes up half
the time of the whole program.

But why my numbers are bad both in git, in Linux and also qgit (not
posted) local repo? If it is a single case other repos should load
fast.

It's the object for 'refs/tags/v1.4.4-rc1' in case you care, btw.

> If you want I can repack and prune, but for now I just wait to avoid
> to corrupt this test case.

What you could try to do is to re-run it a few times (cold-cache) and see
if those numbers really are stable, and if it's always the same object
that takes that long.

Right now I'm running smartctl -t long  /dev/hda, I was a little bit scared ;-)

When finished I will do additional cold chache (reboot) tests.

In fact, you could even do a simple

        time ls -l .git/objects/3a/41a48d139d1425c1d27e3fbe4f511fb7e09e94

for the cold-cache case, and see if just even _that_ takes almost a
second.

If it _is_ stable, there's two possibilities:

 - you have a large and slow disk, and that one object really is way out
   there on the other side of the disk, and seeking really takes almost a
   second.

Its a Thinkpad 2.5 inches HD, 2 years old (IBM/Hitachi Travelstar 40GNX family)

   Quite frankly, I expected that the time when a single stat() took
   almost a second was a decade or more in the past, back in the days of
   floppy-disks. But what do I know?

 - your disk is failing, and ends up doing error recovery etc.

No recent errors are reported:

Stripped from smartctl -a /dev/hda

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
 1 Raw_Read_Error_Rate     0x000b   100   100   062    Pre-fail
Always       -       0
 2 Throughput_Performance  0x0005   100   100   040    Pre-fail
Offline      -       0
 3 Spin_Up_Time            0x0007   165   165   033    Pre-fail
Always       -       1
 4 Start_Stop_Count        0x0012   097   097   000    Old_age
Always       -       4928
 5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail
Always       -       0
 7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail
Always       -       0
 8 Seek_Time_Performance   0x0005   100   100   040    Pre-fail
Offline      -       0
 9 Power_On_Hours          0x0012   073   073   000    Old_age
Always       -       11832
10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail
Always       -       0
12 Power_Cycle_Count       0x0032   099   099   000    Old_age
Always       -       2563
191 G-Sense_Error_Rate      0x000a   100   100   000    Old_age
Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age
Always       -       117
193 Load_Cycle_Count        0x0012   045   045   000    Old_age
Always       -       558210
194 Temperature_Celsius     0x0002   130   130   000    Old_age
Always       -       42 (Lifetime Min/Max 5/59)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age
Always       -       1
197 Current_Pending_Sector  0x0022   100   100   000    Old_age
Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age
Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age
Always       -       0
210 Unknown_Attribute       0x0023   100   100   001    Pre-fail
Always       -       0

SMART Error Log Version: 1
ATA Error Count: 2
       CR = Command Register [HEX]
       FR = Features Register [HEX]
       SC = Sector Count Register [HEX]
       SN = Sector Number Register [HEX]
       CL = Cylinder Low Register [HEX]
       CH = Cylinder High Register [HEX]
       DH = Device/Head Register [HEX]
       DC = Device Command Register [HEX]
       ER = Error register [HEX]
       ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 2 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
 When the command that caused the error occurred, the device was
active or idle.

 After command completion occurred, registers were:
 ER ST SC SN CL CH DH
 -- -- -- -- -- -- --
 10 51 01 0f 8e a8 e4  Error: IDNF at LBA = 0x04a88e0f = 78155279

 Commands leading to the command that caused the error were:
 CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
 -- -- -- -- -- -- -- --  ----------------  --------------------
 20 ff 01 0f 8e a8 e4 00      00:05:26.850  READ SECTOR(S)
 c8 ff 01 0f 8e a8 e4 00      00:05:26.775  READ DMA
 c8 ff 01 00 00 00 e0 00      00:05:26.625  READ DMA
 ca 04 01 ff 52 a8 e4 00      00:03:40.575  WRITE DMA
 c8 04 01 ff 52 a8 e4 00      00:03:40.275  READ DMA

Error 1 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
 When the command that caused the error occurred, the device was
active or idle.

 After command completion occurred, registers were:
 ER ST SC SN CL CH DH
 -- -- -- -- -- -- --
 10 51 01 0f 8e a8 e4  Error: IDNF 1 sectors at LBA = 0x04a88e0f = 78155279

 Commands leading to the command that caused the error were:
 CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
 -- -- -- -- -- -- -- --  ----------------  --------------------
 c8 ff 01 0f 8e a8 e4 00      00:05:26.775  READ DMA
 c8 ff 01 00 00 00 e0 00      00:05:26.625  READ DMA
 ca 04 01 ff 52 a8 e4 00      00:03:40.575  WRITE DMA
 c8 04 01 ff 52 a8 e4 00      00:03:40.275  READ DMA
 ca 04 7d 73 49 02 e0 00      00:03:40.275  WRITE DMA

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

(NOTE NOTE NOTE! There could be other reasons for that second delay. If
the machine was under heavy load, or was running low on memory, maybe the
long delay was just due to havign to swap things out or run other things
instead. That's why it might be interesting to see if the number is
"stable" in that it's always that same object..)

No load, no low memory: I quitted everything before to test. I will
test again as soon smartctl finishes.

 Marco
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html