In linguistics, the semantic field SEA has been studied in different ways, from the semantics of verbs of navigation (e.g. ; ; in linguistic typology; on Ancient Greek) to metaphors connected to the sea (e.g. on Latin). In Greek and Roman culture, the sea holds a prominent position, militarily (; ), economically (; ; ), and culturally (; ; ; ).

This dataset contains linguistic information about more than 25 nouns, verbs, and adjectives connected to the semantic field SEA in four Ancient Greek and Latin texts between 5th – 1st century BCE (Lat. De Bello Gallico by Caesar, Aeneid 1–6 by Vergil; AGr. Histories 1–2 by Herodotus, Argonautica by Apollonius Rhodius).

The dataset has been created to support research on how the concept of SEA is lexicalized in Ancient Greek and Latin poetry and prose, with a case study on four authors.

(2) Method

In this section, I summarize the steps that I followed to obtain the dataset presented here.


  1. Text retrieval: after choosing the texts (see Section 1 and below), I downloaded them in .txt format from Perseus 5.0 – also called Scaife Viewer – of the Perseus Digital Library (; ).
  2. Text annotation: I then uploaded the texts on the annotation platform INCEpTION (, then ; ; ; ; ), developed by the Ubiquitous Knowledge Processing (UKP) Lab at TU Darmstadt. I created my annotation tagsets and layers, based on the linguistic parameters that were of interest for my work, i.e. morphology, lemma, passage, semantics, meaning (literal, metaphorical, metonymic), relations with proper nouns (see Section 3 for a more detailed description). At the end of my annotation, I exported the data in the UIMA CAS XMI (XML 1.0) format.
  3. Data extraction and dataset creation: I used a Python script specifically designed for the UIMA framework to extract the annotated data. I created a dictionary based on token IDs where I mapped the annotation layers. I then exported the dataset resulting from this extraction in CSV format.

Sampling strategy

For this dataset, I decided to focus on two literary genres, i.e. historiography (Lat. De bello Gallico by Caesar; Gr. Histories 1–2 by Herodotus) and epic poetry (Lat. Aeneid 1–6 by Vergil; Gr. Argonautica by Apollonius Rhodius). Given that I also wanted to investigate the distribution of SEA words in Ancient Greek and Latin, I selected parts of these texts depending on the final number of tokens. To maintain a balance between the Latin and Greek sub-corpora, some texts (Herodotus’s Histories and Vergil’s Aeneid) have not been fully annotated. Overall, my corpus has 174,501 tokens. The Greek sub-corpus constitutes 53% of the whole corpus, and it has 92,592 tokens (53,750 for prose and 38,842 for poetry). The Latin sub-corpus has 81,909 tokens (51,313 for prose and 30,596 for poetry).

(3) Dataset Description

The nouns, verbs, and adjectives included in this dataset are:

  • NOUNS: AGr. thálassa, póntos, pélagos, háls, Lat. mare, pontus, pelagus, aequor ‘sea’; AGr. húdōr, Lat. aqua, lympha ‘water’; AGr. háls, Lat. sal ‘sea’, ‘salt’; AGr. kûma, Lat. unda, fluctus ‘wave’; Lat. litus, ripa ‘shore’;
  • VERBS: AGr. pléō (and its preverbed forms occurring in the analyzed texts), Lat. navigo ‘sail’;
  • ADJECTIVES: AGr. thalássios, póntios, Lat. marinus, maritimus ‘maritime, marine’.

In the CSV file, annotations are represented with ten columns and as many rows as the number of SEA tokens in each of the considered texts. Columns provide: (1) the token (TOKEN); (2) its morphological analysis (MORPHOLOGICAL FEATURES); (3) its lemma (LEMMA); (4) its part of speech (POS); (5) the sentence in which the token is found (PASSAGE); (6) the type of token meaning (literal, metaphorical, or metonymic), according to cognitive linguistics and the new WordNets for ancient Indo-European languages () (MEANING); (7) its meaning in context using synsets from the WordNets, preceded by a unique identifier (SYNSET); (8) the token ID (ID); (9) possible words (proper nouns or adjectives) in Ancient Greek or Latin to which a noun meaning ‘sea’ is referred (REFERS TO); (10) the meaning of the phrase resulting from (1) and (9), using synsets from the WordNets, preceded by their unique identifier (DENOTES). An excerpt of the dataset is given in Table 1.

Table 1

An excerpt of the dataset (13 rows of Apollonius Rhodius’s Argonautica).


ἁἁὸςCase=Gen|Gender=Fem|Number=SingἅλςNOUNἔνθ᾽ ἄρα τοίγε ἑςπέριοι ἀνέμοιο παλιμπνοίῃςιν ἔκελςαν, καί μιν κυδαίνοντες ὑπὸ κνέϕας ἔντομα μήλων κεῖαν, ὀρινομένης ἁλὸς οἴδματιLiteral‘n#06781694 a large body of water constituting a principal part of the hydrosphere’25434

πόντῳCase=Dat|Gender=Masc|Number=SingπόντοςNOUNἠῶθεν δ᾽ Ὁμόλην αὐτοςχεδὸν εἰςορόωντες πόντῳ κεκλιμένην παρεμέτρεονLiteral‘n#06781925 a division of an ocean or a large body of salt water partially enclosed by land’25716

ἁλὸςCase=Gen|Gender=Fem|Number=SingἅλςNOUNλάρνακι δ᾽ ἐν κοίλῃ μιν ὕπερθ᾽ ἁλὸς ἧκε ϕέρεςθαι, αἴ κε ϕύγῃLiteral‘n#06781694 a large body of water constituting a principal part of the hydrosphere’26911

πόντονCase=Acc|Gender=Masc|Number=SingπόντοςNOUNἀλλὰ γὰρ ἔμπης ἦ θαμὰ δὴ πάπταινον ἐπὶ πλατὺν ὄμμαςι πόντον δείματι λευγαλέῳ, ὁπότε Θρήικες ἴαςινMetonymic‘n#06783379 the part of the sea that can be seen from the shore’27311

ἁλὶCase=Dat|Gender=Fem|Number=SingἅλςNOUNπερὶ γὰρ βαθυλήιος ἄλλων νήςων, Αἰγαίῃ ὅςαι εἰν ἁλὶ ναιετάουςινMetonymic‘n#06781925 a division of an ocean or a large body of salt water partially enclosed by land’36044[‘Αἰγαίῃ’][‘n#06806923 an arm of the Mediterranean between Greece and Turkey; a main trade route for the ancient civilizations of Crete and Greece and Rome and Persia’]

ἀναπλώοντιCase=Dat|Gender=Masc|Number=Sing|Tense=Pres|VerbForm=Part|Voice=ActἀναπλέωVERBεἰ δ᾽ οὔ μοι πέπρωται ἐς Ἑλλάδα γαῖαν ἱκέςθαι τηλοῦ ἀναπλώοντι, ςὺ δ᾽ ἄρςενα παῖδα τέκηαιLiteral‘v#01260993 travel by boat’39277

ὕδωρCase=Acc|Gender=Neut|Number=SingὕδωρNOUNἔνθ᾽ ἄρα τοίγε κόπτον ὕδωρ δολιχῇςιν ἐπικρατέως ἐλάτῃςινLiteral‘n#10771040 water containing salts’39681

ἅλαCase=Acc|Gender=Fem|Number=SingἅλςNOUNὄϕρα δαέντες ἀρρήτους ἀγανῇςι τελεςϕορίῃςι θέμιςτας ςωότεροι κρυόεςςαν ὑπεὶρ ἅλα ναυτίλλοιντοLiteral‘n#06781694 a large body of water constituting a principal part of the hydrosphere’39864

πόντουCase=Gen|Gender=Masc|Number=SingπόντοςNOUNκεῖθεν δ᾽ εἰρεςίῃ Μέλανος διὰ βένθεα πόντου ἱέμενοι τῇ μὲν Θρῃκῶν χθόνα, τῇ δὲ περαίην Ἴμβρον ἔχον καθύπερθεLiteral‘n#06781925 a division of an ocean or a large body of salt water partially enclosed by land’40063[‘Μέλανος’][‘n#06810637 a sea between Europe and Asia; a popular resort area of eastern Europeans’]

πέλαγοςCase=Acc|Gender=Neut|Number=SingπέλαγοςNOUNπέλαγος δὲ τὸ μὲν καθύπερθε λέλειπτο ἦριLiteral‘n#06783080 an especially deep part of a sea or ocean’40294

ἅλαCase=Acc|Gender=Fem|Number=SingἅλςNOUNἔςτι δέ τις αἰπεῖα Προποντίδος ἔνδοθι νῆςος τυτθὸν ἀπὸ Φρυγίης πολυληίου ἠπείροιο εἰς ἅλα κεκλιμένηLiteral‘n#06781694 a large body of water constituting a principal part of the hydrosphere’40710

ὕδατοςCase=Gen|Gender=Neut|Number=SingὕδωρNOUNἐν δέ οἱ ἀκταὶ ἀμϕίδυμοι, κεῖνται δ᾽ ὑπὲρ ὕδατος ΛἰςήποιοMetonymic‘n#06789983 a large natural stream of water (larger than a creek)’40823

ἁλόςCase=Gen|Gender=Fem|Number=SingἅλςNOUNἠοῖ δ᾽ εἰςανέβαν μέγα Δίνδυμον, ὄϕρα καὶ αὐτοὶ θηήςαιντο πόρους κείνης ἁλόςLiteral‘n#06781925 a division of an ocean or a large body of salt water partially enclosed by land’42805

Object name

25+ SEA words morpho-semantically annotated in Ancient Greek and Latin.

Format names and versions


Creation dates

From 2023-07-07 to 2023-08-10

Dataset creators

Andrea Farina (Department of Digital Humanities, King’s College London): conceptualization, data curation, methodology, formal analysis, data retrieval.


Ancient Greek, Latin, English



Repository name


Publication date


(4) Reuse Potential

Given that this dataset describes the semantics of different words pertaining to the semantic field of SEA in Ancient Greek and Latin, its first reuse potential deals with linguistics. First, the dataset can lead to both onomasiological and semasiological analyses. It can be expanded considering other works, authors, and literary genres, to have a broader overview of SEA words in Ancient Greek and Latin. Similar datasets may also be obtained for other semantic fields and/or languages, to allow for cross-linguistic comparisons either synchronically or diachronically. Moreover, this dataset could serve as the basis to train a model for automatic semantic annotation based on co-occurring words, that can be extracted from the passage in which a token occurs.

This dataset – or other similar datasets – may also be employed in literary-geographical studies, to evaluate, for instance, how a specific place, such as a sea, is referred to in different texts and/or geographical areas – synchronically or diachronically –, and whether the proper noun of a sea tends to occur alone or with one or more common nouns. This may cast some new light on geographical denominations in the ancient world. In this sense, it may also be used to expand already existing online resources, such as Pelagios (; ; ; ; ) or to add further historical depth to the World Historical Gazetteer (; ; ), grouping together places that were called with more than one name.

Finally, more broadly, cross-linguistic analyses conducted in a cognitive framework also allow for psycho-anthropological studies that can address questions such as: How many words did the Greeks and the Romans possess to express one or more concepts related to SEA? How and why does the number of SEA words vary in Greek and Roman texts? How can we account for similarities and differences in this sense? Does this reveal anything about these populations from the cultural point of view?