(1) Overview

Repository location



This dataset was produced as part of a PhD project (ongoing, thesis due to be submitted early 2024), entitled ‘A study of variation and change in the Greek lexicon of the Post-classical period.’

(2) Methods


To show that words increased in length on average throughout the history of the Greek language, a core vocabulary was collated for both Classical Greek and Post-classical Greek, and the mean-average counts for the number of syllables of words of both time periods was calculated. Following Fenk-Oczlon & Pilz () and Mikros & Milička (), syllable count was chosen as the measure of word length rather than number of contrasting segments, or phonemes, which is the metric used by Nettle (). The metric of syllable count was felt to be the best measure of word length, due to the diachronic changes in the pronunciation of graphemes. The average syllable lengths were calculated manually, by going through the word lists and counting the number of syllables in each word. The following boxplots (Figures 1, 2, 3, 4 and 5) show the spread of the distribution of the data.

Figure 1 

Number of syllables in total dataset (Classical VS Post-classical period).

Figure 2 

Number of syllables in adjective dataset (Classical VS Post-classical period).

Figure 3 

Number of syllables in adverb dataset (Classical VS Post-classical period).

Figure 4 

Number of syllables in noun dataset (Classical VS Post-classical period).

Figure 5 

Number of syllables in verb dataset (Classical VS Post-classical period).

Using this dataset, a two-sample t-test was carried out on both the overall average and the average of each word class.

  1. In aggregate, in the Classical Greek sample ([Figure 1], M = 3.09, SD = 1.07) words had fewer syllables than in the Post-classical Greek sample (M = 3.48, SD = 1.13), t(3951) = 13.1, p < .001. The Cohen’s d is 0.32, showing there is a highly significant small-moderate effect size.
  2. Adjectives in the Classical Greek sample ([Figure 2], M = 3.18, SD = 1.07) had fewer syllables than in the Post-classical Greek sample (M = 3.43, SD = 1.02), t(734) = 3.69, p < .001. The Cohen’s d is 0.24, showing there is a significant small effect size.
  3. Adverbs in the Classical Greek sample ([Figure 3], M = 2.29, SD = 0.76) had fewer syllables than in the Post-classical Greek sample (M = 2.81, SD = 0.98), t(321) = 5.55, p < .001. The Cohen’s d is 0.66, showing that there is a significant moderate-large effect size.
  4. Nouns in the Classical Greek sample ([Figure 4], M = 2.70, SD = 0.91) had fewer syllables than in the Post-classical Greek sample (M = 3.16, SD = 1.09), t(1382) = 10.35, p < .001. The Cohen’s d is 0.45, showing that there is a highly significant moderate effect size.
  5. Verbs in the Classical Greek sample ([Figure 5], M = 3.49, SD = 1.04) had fewer syllables than in the Post-classical Greek sample (M = 3.91, SD = 1.07), t(1591) = 9.35, p < .001. The Cohen’s d is 0.38, showing that there is a highly significant small-moderate effect size.

Sampling strategy

The source used in this thesis to collect a core vocabulary of Classical Greek was the complete word list (2188 lemmas), generated by the Perseus software, of Aristophanes’ Clouds. The source used to collect the core vocabulary of Roman period Greek was the Vocabulary of the Greek Testament illustrated from the papyri and other non-literary sources Moulton & Milligan (), which collects 4671 lexemes common to both the New Testament and the Roman period inscriptions and documentary papyri. The language of Aristophanes is widely understood by historical linguists to represent something as close as we can get to everyday language in the Classical period, and the language of the New Testament and papyri is used in the same way for scholars working on the Post-classical period. The choice of these two sources remedies two key problems with Nettle’s () study: firstly his sample size for each language is small, only 50 head-words, and secondly, these were chosen at random from a dictionary, which means that one sample might include mostly rare or technical words while another might include mostly common, everyday words, and so these might not be truly comparable. Furthermore, the dictionaries in question were of different sizes; and Nettle () himself admits that ‘a smaller dictionary would contain generally more common, hence shorter, words.’. While neither of my sources are of course comprehensive, the total number of lexemes collected are significant enough and cover enough core vocabulary to give a representative sample. Although the sample for Post-classical Greek is larger than the sample for Classical Greek, both samples are of a considerable size and contain a similar ratio of different word classes. The following word classes were excluded from the total count in both texts, as they are in all cases significantly rare in both lists, and in some cases irrelevant to a discussion of lexical change: personal and place names; conjunctions; interjections; particles; prepositions; prefixes; pronouns; numerals; articles. Therefore, from both word lists, only nouns, adjectives, verbs and adverbs were taken into account for this investigation. In total, there are 653 nouns, 365 adjectives, 794 verbs, and 129 adverbs, for a total of 1941 surveyed words in Aristophanes’ word list. There are 1760 nouns, 612 adjectives, 1686 verbs and 224 adverbs in Moulton and Milligan’s Lexicon, for a total of 4282 surveyed words.

(3) Dataset description

Object name

Word lengths in Classical and Post-classical Greek.

Format names and versions

Comma Separated Values (CSV).

Creation date


Dataset Creators

Mathilde Bru, PhD student, UCL.


English, Ancient Greek.


CC0 1.10.

Repository name


Publication date


(4) Reuse potential

This dataset was created to study lexical change in Ancient Greek as part of a Historical Linguistics thesis. However, it is also highly re-usable by modern linguists interested in studying diachronic change in word lengths in a corpus language. Studies which have investigated variation in word lengths include Nettle (; ), Wichmann et al. () and Fenk-Oczlon & Pilz (). These papers have demonstrated that there is a negative correlation between phoneme inventory and word length, something which can now be shown to be true for Classical and Post-classical Greek: the Greek of the Post-classical period had fewer phonemes than in the Classical period, and, as the data show, the lexemes of the Post-classical period were longer than those of the Classical period. Previous studies have all have so-far focussed on synchronic comparison between multiple languages. For example, Nettle () compares ten modern languages and repeats his findings in a 1998 paper comparing twelve West African languages; Wichmann et al. () show using data from over 3000 languages collected in the Automated Similarity Judgment Program (ASJP) that average word length and phoneme inventory sizes are negatively correlated; and Fenk-Oczlon & Pilz () analyse parallel text material from 61 languages and also find a negative correlation between phoneme inventory size and mean length of words, measured as number of syllables. This dataset is the first to collect relevant data on a single language diachronically (i.e. as opposed to its synchronic application on multiple languages which are being compared), and as such would be useful for linguists looking for evidence to show that the negative correlation between phoneme inventory and word length is found diachronically.

This dataset is also the first to show that negative correlation between phoneme inventory and word length holds true for ancient, as well as modern languages. In addition to the re-use potential for linguists, it would be of use for classicists and historical linguists looking at the diachronic evolution of Greek and needing data showing the average word lengths in the four main inflectional word classes of Greek in two different time periods. This would be useful for specialists in Classical and Post-classical Greek language and literature, as it would facilitate studies on the evolution of the ancient language, from the Classical to the Post-classical period.