IPA for Singing in German

The letters of the International Phonetic Alphabet (IPA) are edited in the official IPA chart of the International Phonetic Association. This chart aims to depict the entire repertoire of sounds in the world. Since the 1990s, the number and shape of the letters has remained fairly consistent. IPA is not only internationally established in linguistics, but also in multiple other fields such as vocal pedagogy. The letters are, in fact, an approximation. Thus, while being very practical and exact in context with the lyrics, IPA is most effective combined with recordings of the spoken or sung lyrics, and pronunciation exercises adapted to the desired language.

In the following figures, you can see the relevant letters for German phonology and phonetics.

Figure 1: Consonants from Kohler (2003), adjusted.

Figure 1: Consonants from Kohler (2003), adjusted.

Figure 1 depicts the consonants. The consonant table is arranged in rows which indicate the manner of articulation and columns that indicate the place of articulation. The consonants come in pairs of voiced and unvoiced sounds.

Figure 2 shows the vowel chart. It maps the height (close - open) and position (front – back) of the tongue, with the letters of the vowels in German, positioned at the corresponding points. Depending on the singing tradition, voice type, and other individual characteristics, the absolute position of the vowels in the vowel chart can vary.

Figure 3, again, depicts the vowel chart, but this time for the diphthongs when singing in German. Note that the transcription for singing differs from the transcription for spoken language.


Figure 2: Vowels from Kohler (2003), adjusted graphically. Figure 3: Diphthongs from Kohler (2003) (grey), adjusted.

As a reference, we used Klaus Kohler’s article about the phonology of the German language from the Handbook of the International Phonetic Association and adjusted the charts and rules for singing. If you are interested in learning more about singing in German, also using IPA, you can consult the Handbook of Diction for Singers by David Adams in its latest edition from 2008.

The diction of lyrics in classical singing differs from the pronunciation of normal speech in some respects. In figure 3, we already indicated the difference of the pronunciation of diphthongs in German. The deviation of transcription can be explained, for instance, by the lower position of the larynx in classical singing.

Another special case we would like to address briefly is the »r« in German. This sound is an allophone, which means that it can be pronounced as a tap, trill, fricative, or low Schwa. At the end of a word or a syllable, the »r« is rarely pronounced as trill or tap, it usually becomes a low Schwa. However, sometimes the pronunciation can emphasize the musical gesture or the meaning of the lyrics. This usually depends on the chosen tempo. For instance, it makes perfectly sense to pronounce the words »klar« and »ruht« of the phrase »Still und klar ruht der See« from the well-known German song »Leise rieselt der Schnee« as a trill or tap in order to express the tranquil and static gesture of the song. If the tempo is faster, though, it makes less sense to pronounce the word »klar« that clearly. Thus, the »r« becomes a low Schwa or gets contracted with the following trill or tap. If you feel insecure about this, it might help to listen to recordings. A selection of these is listed at the end of the volume.

In German, the aspiration in singing is usually shorter than in speaking. For this, it helps to consider not only the phonetic transcription but also the original orthographic text. This way you can usually tell how long the aspiration, respectively VOT (voice-onset-time) is meant to be. As the VOT varies significantly from language to language and is therefore difficult to capture in writing, working with recordings is also useful here.

In principle, we do not assume that the phonetic transcriptions for choir and solo singing are fundamentally different, apart from minor exceptions. The main difference between different types of singing often lies in the voicing and the use of formants, which cannot be expressed with the IPA, or only to a limited extent. Further, linguistically relevant procedures and decisions are explained in the critical report.


Kohler: German. International Phonetic Association: Handbook of the International Phonetic Association: A Guide to the Use of the International Phonetic Alphabet. Cambridge: Cambridge University Press, 2003.

Adams, David. A Handbook of Diction for Singers: Italian, German, French. New York: Oxford University Press, (2008).

Karna, Duane R.: The Use of the International Phonetic Alphabet in the Choral Rehearsal. Lanham, Maryland: Rowman & Littlefield (2012).

Critical Report

Since our editions hve a focus on pronunciation when singing, the critical report is deliberately kept short. However, we would like to comment on some fundamental philological, musicological, and linguistic decisions.
The IPA-transcription was highlighted with bold text, enabling us to omit square brackets or slashes that usually mark a phonetic or phonological transcription.

Primary and secondary stresses are assigned according to stresses in spoken words. They do not necessarily have to correspond to the musical phrase. Accentuations are not indicated, but usually result from the musical phrase. In exceptional cases, word stresses can also be indicated that result from a rhyming structure, but which may run counter to a word stress that is perceived as natural. In these special cases, secondary stress is generally used.

Figure 4: Example of primary and secondary stress in Robert Schumann's Liederkreis. The primary stress is notated as ˈ and the secondary stress as ˌ.

Syllables are not, as usual, marked by punctuation, but with hyphens. As in the orthographic text, we also use underscores at the word ends to facilitate the reading of the transcription.

When consonants occur at the end of a syllable in the middle of a word, they are systematically moved to the onset of the following syllable. This rule is only deviated from in exceptional cases, such as the combination with extremely short note values. This approach becomes particularly comprehensible in combination with longer note values and melismas: Many singers aim at staying on sonorous sounds like vowels as long as possible.

Figure 5: Example of the shifting of syllable boundaries and the transcription of diphthongs in the motet Singet dem Herrn by Johann Sebastian Bach. This can be seen in the word "fröhlich" (the stretched h is not pronounced) and in the word "Harfen".  You can also see the transcription of the diphthongs, which are notated differently when sung than when spoken.

Diphthongs are transcribed in a completely different way than when speaking, as you can see in figure 3. It is also considered that when singing, the first vowel of the diphthong is usually sung longer than the second vowel or the transition from one vowel to the other, which cannot be recorded in the transcription, or only inadequately. The transmission of the second vowel in each case can be, amongst other reasons, explained by the often-low position of the larynx in classical singing technique.

As described in our introduction to  IPA for Singing in German the »r« in German can be pronounced in various ways. Due to this complexity, we decided to go with the simplest variant, usually the low Schwa. But here, it is possible to adapt the pronunciation according to your own preferences and interpretation. Thus, at these points, the transcriptions are suggestions rather than static guidelines. Note: Especially in solo singing, the variant with trill/tap at the end of a word or syllable is still common in some singing schools. It is often up to the transcriber to decide which sounds are preferred in these cases.

The aspiration of consonants is not transcribed as it is predictable due to phonological rules and also tends to be less when singing than when speaking. This applies especially for soloist singing.

In some cases, a liaison indicates that there are originally two words, which, due to the spelling or composition, are to be pronounced as one word. In this way, we ensure smooth attribution of the transcription to the original orthographic text.

