Computers may take several seconds to process sounds because
it has to look up each word in its database of phonemes
which instructs the speech engine how to pronouce each word.
The phonemes are converted into a sound file. All this takes much computational power.
A phoneme string consists of one or more phoneme symbols and stress marks, optionally separated by whitespace.
The elapsed time from when the client first sends text to the server to when the client receives the first audio buffer from the server is measured using metric named TTFA = time first audio received.
There are several phoneme dictionaries.
The SAMPA phoneme set is used internationally for German, French, etc.
The DARPA phoneme set (nicknamed the darpabet) is used by US English voices to represent the sounds in the English language.
The IPA (International Phonetic Alphabet)
[W] was devised by the
IPA (International Phonetic Association) (established in 1886 in Paris)
to represent the sounds of all languages.
So, (unlike the darpabet) uses non-Latin characters -- 107 distinct letters and 56 diacritics and suprasegmentals
visible in a
font of their own design.
Apple Macintosh computers come with a MacinTalk text-to-speech embedded voice synthesizer
that turns ASCII text into speech through its speaker.
Apple's North American phoneme text symbols represent vowels as pairs of uppercase letters and consonants by single letters.
However, the DARPA phoneme set (the "darapabet" used by AT&T for English) does not capitalize vowels:
Phoneme | Example Word | Example Transcription |
ey | bait | b ey t |
ae | bat | b ae t |
iy | beat | b iy t |
eh | bet | b eh t |
ay | bite | b ay t |
ih | bit | b ih t |
ow | boat | b ow t |
aa | bob | b aa b |
ao | bought | b ao t |
aw | brown | b r ow n |
oy | boy | b oy |
ah | but | b ah t |
ax | about | ax b ow t |
uw | boot | b uw t |
uh | book | b uh k |
er | bird | b er d |
b | bet | b eh t |
ch | church | ch er ch |
d | dog | d ao g |
dx | butter | b ah dx er |
f | fog | f ao g |
g | got | g aa t |
hh | hot | hh aa t |
jh | jump | jh ah m p |
k | kit | k ih t |
l | lot | l aa t |
em | Chatham | ch ae t em |
m | Mom | m aa m |
en | satin | s aa t en |
n | nod | n aa d |
ng | thing | th ih ng |
p | pot | p aa t |
q | button | b ah q en |
r | rat | r ae t |
s | sat | s ae t |
sh | shut | sh ah t |
t | top | t aa p |
dh | that | dh aa t |
th | thick | th ih k |
v | vat | v aa t |
w | won | w ah n |
y | you | y uw |
z | zoo | z uw |
zh | measure | m eh zh er |
Modifiers (also called prosodic control symbols) are used to specify emphasis in the DARPA phoneme set, these
Description | darpabet | Apple | SAMPA |
Silence | pau | % | |
No stress | 0 | | |
Breath intake | | @ | |
Primary stress | 1 | | |
Secondary stress | 2 | | |