Proposal for UTF8 vs performance tip.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Avoid UTF8 processing if you don't need it and have extra speed

Many often used utilities are much slower with UTF-8 processing.
If you want extra speed and do not need UTF-8 processing, disable it using
export LANG=C
export LC_ALL=C (not needed if LC_ALL was not set)

compare
  time grep -i -c some_string  some_large files
with
LANG=en_US.UTF-8
and same with
LANG=C

On modern CPU grepping like this a 100MB files take some 2 seconds
with UTF8 (Celeron 3GHz) and is about hundred times faster (0.02s)
with LANG=C

Even more spectacular speedup is for
sort -f

[Index of Archives]     [CentOS]     [CentOS Announce]     [CentOS Users]     [CentOS Virtualization]     [Linux Media]     [Asterisk]     [Netdev]     [X.org]     [Xfree86]     [Linux USB]     [Project Hail Cloud Computing]

  Powered by Linux