1995 BIO Round Two Question Two

Languages Problem (30 min)

You are supplied with a number of representative files from a number of different languages. All languages use the Latin alphabet A to Z; case can be ignored and the only punctuation characters are . , ; : and single space characters separating words. For example, you might have the files FRENCH.1, FRENCH.2 to FRENCH.9, ENGLISH.1 to ENGLISH.6 and so on.

Given a file TEXT in the same format but of unknown origin, how would you go about programming a computer to identify its language?

Solution to the Languages Problem

Antony Rix
contact details