Paneuropean Linux

Abstract:
This paper reports the way that efforts in european countries on
nationalizing and promoting Linux could be joined together to produce an
operating system supporting a wide spectrum of languages.
The studied countries are the EURO-15 members.

0) The US-English dominated Information World & Internet

The most frustrating aspect of Information Systems in European countries
is their US-English-centric design and behaviour. 
This affects Italian, German, French, Spanish, Danish, Portugese, Dutch,
Swedish, Norwegian, Finnish language speakers, at a higher degree Greek
language speakers, and in a lesser degree even Brittish English language
speakers, because of a set of cultural and technical differences that
exist in United States, like the currency, metric systems, spelling of
certain words and language rules.
The main branch of european scripts is that they use accented symbols,
which are less than 60 in total, except greek script which has a lot
different letters, punctuation and combinations.

1) The ASCII (7bits)

The first implementation of  ASCII (American Standard Code for
Information Interchange) was 7bits only, and included 32 control
characters, the numbers, all small and capital letters of english
alphabet and some more symbols usefull in everyday notation. Some hacks
had been on ASCII for support of other languages, but they become
standards after their wide use.

2) Extended ASCII set solution ( 8bits )

ASCII soon was extended, occupying 8bits (getting the one used for
parity, highlight, End Of String etc), and representing all symbols of
languages that were not far different from english and could fit in the
new space.

One extension happened in DOS, where the extra characters took the
places ranging from 0x80. The result was not good on  terminals because
ignoring the 8th bit could result in reading control codes, which always
caused trouble.

A second widely adopted solution was approached in DEC's terminals,
where are new symbols started after 0xA0.
Needless to say, that was better, and it soon became the standard
ISO-8859-X series, which nowadays is the most used encoding. Windows
uses a slightly different than ISO-8859-X standard.

3) The Unicode ( 16bits )

A new encoding is Unicode which is 16bit and includes support for most
known and widely spoken scripts.
The Unicode is recognised by ISO as ISO10646 and will be soon recognized
be IETF as the standard on Internet (it is already an RFC #2277).

4)  Comparisons between scripts & standards

Main Language (script) sets:
Latin1 (West European) 	French (fr), Spanish (es), Catalan (ca), Basque
(eu), Portuguese (pt), Italian (it), 			Albanian (sq),
Rhaeto-Romanic (rm), Dutch (nl), German (de), Danish (da), 
			Swedish (sv), Norwegian (no), Finnish (fi),
Faroese (fo), Icelandic (is), Irish (ga), 			Scottish
(gd), and English (en)
Latin2 (East European) 	Czech (cs), Hungarian (hu), Polish (pl),
Romanian (ro), Croatian (hr), Slovak (sk),
Slovenian (sl), Sorbian
Latin3 (South European) 	Esperanto (eo) and Maltese (mt)
Latin4 (North European) 	Estonian (et), Lettish (lv) and
Lithuanian (lt), Greenlandic (kl) and Lappish
Cyrillic:			Bulgarian (bg), Byelorussian (be),
Russian (ru), Serbian (sr) and Ukrainian (uk)
Greek:			Greek (el)

This where the main sets (codepages) under DOS, Windows, Unicode
Position:

DOS				Windows	ISO		Script	Unicode
Position
CP437	DOSLatinUS
CP850	DOSLatin1		Windows-1252	ISO-8859-1	Latin1
U0000
CP852	DOSLatin2		Windows-1250	ISO-8859-2	Latin2
U0100?
						ISO-8859-3	Latin3
U0100?
						ISO-8859-4	Latin4
U0180?
CP855	DOSCyrillic
CP866	DOSCyrillicRussian	Windows-1251	ISO-8859-5	Cyrillic
U0400
CP860	DOSPortuguese
CP737	DOSGreek		Windows-1253	ISO-8859-7	Greek
U0370
CP861	IBMDOSGreek #1	
CP869	IBMDOSGreek #2	


5) The problem with fonts, reading and writing with deadkeys

The extra symbols of european languages, require fonts well defined for
the area 128-255.
Some applications are strange enough that they can't pass 8bit
characters correctly.

Seeing the new characters is easy, but generating them is usually
harder.
Most EU scripts have accents, so they definately need dead-keys.
Dead keys are not an easy thing to do, especially over terminals, and in
a multiuser environment each user has his own state of keyboard, his own
fonts.

6) The NLS

NLS stands for National Language Support and its purpose was to make it
easier for use of different languages in a UNIX system.
NLS changes the behaviour of applications both on screen messages and
text handling.
NLS is the right substrate for the Internationalization of a UNIX
system.
NLS is quite fine when set-up properly, but setting it up right is a
mess, even for an administrator.
The problems usually come from the distributions and the packages that
are not

7) The KDE graphical environment

Luckilly, the KDE graphical environment was first conceived in
Germany, and as a result is very international oriented. 


8) Linux by country

Character set	Lang	Country		Locales		Primary National
Site
ISO-8859-1	au	Austria		de_AT		www.luga.or.at
ISO-8859-1	be	Belgium		de_BE, fr_BE	linux.rtfm.be
ISO-8859-1	da	Denmark		da_DK,en_DK	www.linux.dk
ISO-8859-1	fi	Finland		fi_FI, sv_FI
www.mpoli.fi/flug
ISO-8859-1	fr	France		fr_FR
www.linux-france.com
ISO-8859-1	de	Germany		de_DE		www.linux.de
ISO-8859-7	el	Greece		gr_GR (el_GR) 	www.linux.gr
ISO-8859-1	ir	Ireland		en_IE		ilug.csn.ul.ie
ISO-8859-1	it	Italy		it_IT		www.linux.it
ISO-8859-1	lu	Luxembourg	de_LU, fr_LU	www.linux.lu
ISO-8859-1	nl	Netherlands	nl_NL		www.nllgg.nl
ISO-8859-1	pt	Portugal		pt_PT
www.netdados.com.br/tlm
ISO-8859-1	es	Spain		es_ES		slug.ctv.es
ISO-8859-1	sv	Sweden		sv_SE		www.sslug.dk
ISO-8859-1	en	United Kingdom	en_GB		www.linux.org.uk
2			15		11


7) Spelling problems
8) The "lost" european users and the reasons
10) mounting filesystems
11) THE NEED FOR A PANEUROPEAN DISTRIBUTION
12) Benefits
12) A challenge for european minded people

Interesting similarities among European Users & their Languages

Big companies of Information Systems are mainly located in United States
so they always keep failing to give european users the right support for
OS. There is No Paneuropean Unix environment.


References:
Character mnemonics and character sets,	
	http://andrew2.andrew.cmu.edu/rfc/rfc1345.html

Greek Character Encoding for Electronic Mail Messages,	
	http://andrew2.andrew.cmu.edu/rfc/rfc1947.html

Internationalization (i18n) FAQ,	
	http://www.vlsivie.tuwien.ac.at/mike/i18n.html

KDE I18N
	http://www.kde.org/i18n.html