linux | Janne Honkonen - Janne Honkonen - Official pages of the Janne Honkonen, visual artist

Linux, Putty and UTF-8 character encoding problems

If you wish to skip my regular commentary track and go straight to the solution, click here.

Prologue

Scandinavia is a great place to live- We have nature, we have technology, we have somewhat good beer and somewhat good internet. What we do not have is a good character set.

I can’t start counting how many time, since 1990s, I have had a problem with character sets with computer systems. Starting from 90s DOS (Disk Operating System) keyboard character sets (good old Keyb SU) towards more modern operating systems Windows XP, 7, 8 and (grr..) 10 and with other systems like good old Linux. After DOS, and much before my initial Linux learning, I was relieved that you could handle keyboard charcter set problems with few simple desktop clicks.

Then along came Linux. And the Linux terminals.

Describing the problem

I love Linux — It’s like the Tenacious D of the operating systems .. “God damnit, I’m going to read it anyway, because I wrote it, and it’s the truth- I f*cking love this band, they are the best band ever – period.” (from: Pick of Destiny -movie). Linux, is simply, the most tweakable, configurable, loveable and mucha-fucka best, operating system. And it is also capable to give us more problems than any other operating system can give, though, Windows 8 excluded.

One of the most usual problem people encounter with Linux, specially when using terminal-based system, is a character-problem unless you live in the candy-land US/GB where there is no any special character. This means that special characters, like Ä, Ö and Å inserted from your keyboard do not appear as such in the Linux terminal.

This means your Linux terminal does not translate or show correct characters inserted into the terminal. Usually this is either because of the fact, that your linux does not have correct character set configured, terminal does not have it installed or it is not well translated, specially when using remote terminal like PuTTy, which btw, is the best terminal software ever.

So far, I today had one of the most annoying character encoding problem so far with Linux and it took about hour to find out correct answer to fix Linux character encoding problem with UTF-8 problems. But finally, I went through all phases and here it is: Solution for Linux, Putty and Character set encoding with UTF-8.

Solution

First thing first: The only good character encoding set is UTF-8. It has ALL the characters and symbols you will never ever and forever need. You NEVER should use limited geographical based character set or encoding with your system because if you encounter foreing characters, they will not display correctly.

So, lets fix your Linux character encoding problem.

Phase 1 – Check your terminal software

Are you using terminal software to connect your Linux terminal? Such as PuTTY? Check that your software is using correct font library which HAS UTF-8 character set supported. If you are not sure that your font-library supports UTF-8, download open source DejaVu – font library and take it on use with your terminal. If this fixes all, good. It usually does not, but trying does not hurt.

Phase 2 – Configure UTF-8 locales

Your Linux shell needs to have correct character set to use work. Check that your Linux has “Locales” package installed:

sudo dpkg -l locales

If the last line, which should look like for example, like this : “ii locales 2.11.3-4 Embedded GNU C Library: National Language (locale) data [support]“, has “ii” in the beginning of the line, you have Locales installed. If you do not have it, you need to install locales package. With Debian Linux, you would do it with command apt-get install locales or aptitude install locales. With other linux distributions, use proper package installion method, just remember package name locales.

Now that you have Locales installed, you need to check that your system has a UTF-8 locale generated. This you can do with command:

sudo dpkg-reconfigure locales

You will receive a list of locales available which can be navigated using arrow keys and select items with spacebar – key. From this, select Locales you wish your Linux system to use. From example, I’m Finnish, so I selected two locales: fi_FI.UTF-8 UTF-8 and  fi_FI@euro ISO-8859-15, latter just for in case because first should be sufficient for everything. Choose your UTF-8 locale using good thumb rule: Your language, your country, UTF-8.

This means for people living in United Kingdom, correct set would be en_GB.UTF-8, for US resident en_US.UTF-8. For german resident de_DE.UTF-8. You can use this example helping selecting correct character set for your system.

Phase 3 – Configuring Bash

For most, previous steps may already help, but here are things to make everything fool-proof. Next we will configure bash to use correct language settings by putting three lines both to your ~/.bashrc and ~/.profile files (~ = home directory, for root it is /root/ but ~ will automatically translate to your home directory. Remember you cannot see these files with normal directory listing as they have dot in front of them, making them hidden.)

For example, you can edit files like this:

sudo pico ~/.bashrc and sudo pico ~/.profile

And add these lines to both files:

export LC_ALL=fi_FI.UTF-8
export LANG=fi_FI.UTF-8
export LANGUAGE=fi_FI.UTF-8

Just replace fi_FI with your language and country code mentioned above.

Phase 4 – Configure Environment

Steps above, again, help most of the people above. Still, people like me, I still had problems and I needed to add language definitions also to the environment file. So, just to make sure, lets do this also:

Add following lines to the /etc/environment

LC_ALL=fi_FI.UTF-8
LANG=fi_FI.UTF-8

And again, replace fi_FI with your language and country code mentioned above. And of course, save everything.

Phase 5 – Check settings

Now everything should be just fine. Check your locale – checker output by running locale – command:

locale

Output of the command should look something like this:

LANG=UTF-8
LANGUAGE=
LC_CTYPE=”fi_FI.UTF-8″
LC_NUMERIC=”fi_FI.UTF-8″
LC_TIME=”fi_FI.UTF-8″
LC_COLLATE=”fi_FI.UTF-8″
LC_MONETARY=”fi_FI.UTF-8″
LC_MESSAGES=”fi_FI.UTF-8″
LC_PAPER=”fi_FI.UTF-8″
LC_NAME=”fi_FI.UTF-8″
LC_ADDRESS=”fi_FI.UTF-8″
LC_TELEPHONE=”fi_FI.UTF-8″
LC_MEASUREMENT=”fi_FI.UTF-8″
LC_IDENTIFICATION=”fi_FI.UTF-8″
LC_ALL=fi_FI.UTF-8

With variying results. Remember output of the command depends of the locale settings of your system.

Phase 6 – Conclusion

Now everything should work. For most people, phases 1 to 4 will fix erverything, and for rest, phase 5 will surely fix the rest of the problems. If you still encounter problems, before re-doing all the steps, just make sure that the settings of your filesystem allow writing to files. You can check file permissions with ls -l command which will display file permissions of the files in the active directory.

You can always fix file permissions with sudo chmod command (for example, sudo chmod 0655 /etc/environment). Below is the default working permissions command for all the files mentioned in this article.

  • sudo chmod 0655 /etc/environment
  • sudo chmod 0655 ~/.bashrc
  • sudo chmod 0655 ~/.profile

I will update this article needed later. If this article helped you, feel free to link to this article and comment here.

© Janne Honkonen - www.jannehonkonen.com