AlbaCode

Unicode

Learn about a different method of storing text

Introduction

In National 5, we learned how computers store text using a system called ASCII.

ASCII has some flaws which means that it can't be used for all text files.

During this section we will learn about the flaws that ASCII has and how Unicode address them.

Character sets

The entire set of characters that the computer can represent is known as the character set.

The extended ASCII character set contains 256 characters.

Every character set is made up of:

Printable Characters

Printable characters include letters, numbers and symbols. You can think of these as characters that would use ink if you were to print out a document.

For example:

A, B, C, 1, 2, 3, @, $, %

Non-Printable Characters

Non-printable characters are characters that you cannot directly see but have a visible effect on the page. You can think of these as characters that would change the layout of the page but would not use ink if the document were printed.

For example:

SPACE, TAB

Control Characters

Control characters are used to perform actions rather than to display a printable character on screen.

For example:

ESCAPE, BACKSPACE, DELETE

ASCII

ASCII was originally developed for basic computers and used a 7-bit code to represent characters. This allowed computers to represent 128 different characters.

As the need for more characters grew, extended ASCII was introduced. This used 8-bits to represent each character which increased the maximum number of characters to 256.

Limitation of ASCII

ASCII can only be used to represent 256 (or 28) different characters.

With the invention of the internet and where people from all around the world were suddenly able to communicate, the need for a larger character set grew.

The Chinese language alone has over 50,000 characters!

To resolve this issue a new system was developed called Unicode

Unicode

Unicode is another method that computers can use to store text. There first version of unicode used 16-bits to represent each character and is called UTF-16. This allows us to represent 65,536 (216) different characters.

This allows Unicode to represent a much larger range of characters from other languages.

Unicode (UTF-16)

AdvantageDisadvantage
Has a very large character set which is able to represent foreign language symbols.Uses 16-bits to store each character instead of ASCII's 8-bits, resulting in a larger file size.

ASCII

AdvantageDisadvantage
Uses 8-bits to store each character instead of Unicode's 16-bits, resulting in a smaller file size.Has a small character set which prevents many foreign language symbols being used.