Cheyenne and international standards

Michael Everson (everson@irlearn.ucd.ie)
Mon, 23 May 1994 16:21:36 WET


[Note for NAT-LANG readers: Wayne Leman is a linguist working
closely with the Northern Cheyenne; I met with him in Lame Deer a
few summers ago, and we discussed computing in minority languages
at that time. A lot of the work I do has to do with fonts and
writing systems, and 8-bit and 16-bit character encoding, chiefly
on the Mac but also on Windows. In response to Paula Wagoner's
request on this forum, I got in touch with Rubie Sooktis and am
providing her with Cheyenne fonts and keyboards for her film
project; developing those utilities has raised some questions
which have general implications for Native orthography and
computing. I asked Wayne for permission to ask him some questions
about Cheyenne encoding and to copy it here to NAT-LANG. He agreed,
and also said:

>I'm no expert in orthographies nor the "machine" pressures put on
>us to keep them "simple" to prevent us from having to develop
>complicated glyphs, fonts, etc. I do know that the Native
>communities themselves are split over whether they should have
>orthog's as much English (or French) looking as possible or
>whether they should be very unique like syllabaries. I just got
>the book Flutes of Fire by Leannne Hinton (U.C. Berkeley), 1994,
>in which she discusses a number of these issues so critical to
>language preservation and the practical programs some Native
>communities are implementing for language maintenance.

I'd be grateful if someone could tell me how I can order a copy
of _Flutes of Fire_ here in Ireland.

Now, then...]

Hi Wayne. I have a question. You've told me that the glyph (or
form of the letter used for printing) used for Cheyenne voiceless
vowels can be _either_ RING ABOVE or DOT ABOVE. I understand that
Cheyenne can read their language whether RING ABOVE or DOT ABOVE
is used. However in the international standard ISO/IEC 10646-1 (=
Unicode) these would be considered two separate characters. This
point has important ramifications for computing in Cheyenne.

Let's fantasize a moment that technology (or the money to buy it)
is not a particular constraint, and perhaps jump ahead a few
years. Everyone will have a 16-bit character set running by then.
Let's say that someone is building up an electronic database of
Cheyenne texts for on-line searching, historical texts perhaps, or
glossaries. What characters will be used? This is a question of
the _semantics_ of the characters used to write Cheyenne, rather
than a question of their _presentation_. The problem is
complicated further by the fact that some characters used to write
Cheyenne will have to be added to ISO 10646 _in_any_case_. In the
fonts I have made for Rubie Sooktis, I used RING ABOVE for the
voiceless vowels. The shape and identification of the glottal stop
character is also a concern, which I will discuss below.

Following from our discussions and some of the articles by you
which I have seen, I have the "special character" repertoire for
Cheyenne as follows. The number preceding the character is the ISO
10646/Unicode hexadecimal address; where no number is given it
means that this character does not exist in the standard as a
precomposed character. The first group comprises the voiceless
vowels and sha. The second group comprises possible characters to
be used to represent the glottal stop. You have told me that the
vowels with ACUTE, MACRON, and GRAVE were not used by Cheyennes to
mark tone except for technical works like dictionaries; I include
those in the third section.

[Three vowels and one consonant needed to write Cheyenne]

00C5 *LATIN CAPITAL LETTER A WITH RING ABOVE ---- is A WITH DOT ABOVE
---- *LATIN CAPITAL LETTER E WITH RING ABOVE 0116 is E WITH DOT ABOVE
---- *LATIN CAPITAL LETTER O WITH RING ABOVE ---- is O WITH DOT ABOVE
0160 *LATIN CAPITAL LETTER S WITH CARON
00E5 *LATIN SMALL LETTER A WITH RING ABOVE ---- is A WITH DOT ABOVE
---- *LATIN SMALL LETTER E WITH RING ABOVE 0117 is E WITH DOT ABOVE
---- *LATIN SMALL LETTER O WITH RING ABOVE ---- is O WITH DOT ABOVE
0161 *LATIN SMALL LETTER S WITH CARON

[Three Letters which could be used for the Glottal stop]

0294 *LATIN LETTER GLOTTAL STOP This is the large "question mark" one
02C0 *MODIFIER LETTER GLOTTAL STOP This is a small raised "question mark"
02BC *MODIFIER LETTER APOSTROPHE Looks like the curly punctuation

Note that these are NOT the punctuation apostrophe, but _letters_.

[Special vowels used by linguists to represent Cheyenne tones]

00C1 LATIN CAPITAL LETTER A WITH ACUTE
00C9 LATIN CAPITAL LETTER E WITH ACUTE
00D3 LATIN CAPITAL LETTER O WITH ACUTE
00C0 LATIN CAPITAL LETTER A WITH GRAVE
00C8 LATIN CAPITAL LETTER E WITH GRAVE
00D2 LATIN CAPITAL LETTER O WITH GRAVE
0100 LATIN CAPITAL LETTER A WITH MACRON
0112 LATIN CAPITAL LETTER E WITH MACRON
014C LATIN CAPITAL LETTER O WITH MACRON
00E1 LATIN SMALL LETTER A WITH ACUTE
00E9 LATIN SMALL LETTER E WITH ACUTE
00F3 LATIN SMALL LETTER O WITH ACUTE
00E0 LATIN SMALL LETTER A WITH GRAVE
00E8 LATIN SMALL LETTER E WITH GRAVE
00F2 LATIN SMALL LETTER O WITH GRAVE
0101 LATIN SMALL LETTER A WITH MACRON
0113 LATIN SMALL LETTER E WITH MACRON
014D LATIN SMALL LETTER O WITH MACRON

The question for Cheyenne computing: shall voiceless vowels be
represented by RING ABOVE or DOT ABOVE? On present-day equipment,
admittedly it's a question of six of one and half-dozen of
another--but when the 16-bit character sets are made available,
and cheaply as they will be, it will not be a moot point, and I
would encourage the Cheyenne elders and teachers to take a
decision in anticipation of that time. Likewise, the
representation of the glottal stop is problematic: while it's true
that typewriters use the apostrophe, the question is not so clear
for computing. To use the punctuation letter apostrophe as a
letter would be wrong (or rather undesirable), particularly as
computer software often turns it into curly quotes going one
direction or another; and parsing and sorting software will treat
punctuation as punctuation not as letters. Since there are letters
suitable for representing the glottal stop, I would again
encourage those involved with the Cheyenne language to meet to
discuss the question. If, Wayne, you can forward me a paragraph or
two in Cheyenne I will be glad to typeset it in several font
styles with the various options (dots, rings, question marks,
apostrophes) to aid the discussion; I'm making fonts for Cheyenne
anyway. Certainly I'd like those fonts to give the best support
for the Cheyenne language they can. (Right now I am using rings
and the apostrophe form, but can change that easily.)

I'm eager to hear your thoughts on these points, Wayne.

This sort of question has ramifications for other Native
languages. A couple of years ago I put out a "call for alphabets"
which didn't get a lot of response (I got good data on Micmac from
James Fidelholtz and some non-language-specific data on Peruvian
languages from Mary Ruth Wise. Perhaps an inventory of signs and
symbols, used both in scientific contexts but also (and perhaps
more importantly) in popular contexts would be helpful. I'd be
glad to assemble this inventory if anyone would like to write me
about the letters used in your languages.

Michael Everson
Everson Gunn Teoranta
15 Port Chaeimhghein I/ochtarach
Baile A/tha Cliath 2
E/ire

Supporting minority languages worldwide.