(1) Overview
Repository location
Context
In linguistics, the semantic field SEA has been studied in different ways, from the semantics of verbs of navigation (e.g. ; ; in linguistic typology; on Ancient Greek) to metaphors connected to the sea (e.g. on Latin). In Greek and Roman culture, the sea holds a prominent position, militarily (; ), economically (; ; ), and culturally (; ; ; ).
This dataset contains linguistic information about more than 25 nouns, verbs, and adjectives connected to the semantic field SEA in four Ancient Greek and Latin texts between 5th – 1st century BCE (Lat. De Bello Gallico by Caesar, Aeneid 1–6 by Vergil; AGr. Histories 1–2 by Herodotus, Argonautica by Apollonius Rhodius).
The dataset has been created to support research on how the concept of SEA is lexicalized in Ancient Greek and Latin poetry and prose, with a case study on four authors.
(2) Method
In this section, I summarize the steps that I followed to obtain the dataset presented here.
Steps
- Text retrieval: after choosing the texts (see Section 1 and below), I downloaded them in .txt format from Perseus 5.0 – also called Scaife Viewer – of the Perseus Digital Library (; ).
- Text annotation: I then uploaded the texts on the annotation platform INCEpTION (, then ; ; ; ; ), developed by the Ubiquitous Knowledge Processing (UKP) Lab at TU Darmstadt. I created my annotation tagsets and layers, based on the linguistic parameters that were of interest for my work, i.e. morphology, lemma, passage, semantics, meaning (literal, metaphorical, metonymic), relations with proper nouns (see Section 3 for a more detailed description). At the end of my annotation, I exported the data in the UIMA CAS XMI (XML 1.0) format.
- Data extraction and dataset creation: I used a Python script specifically designed for the UIMA framework to extract the annotated data. I created a dictionary based on token IDs where I mapped the annotation layers. I then exported the dataset resulting from this extraction in CSV format.
Sampling strategy
For this dataset, I decided to focus on two literary genres, i.e. historiography (Lat. De bello Gallico by Caesar; Gr. Histories 1–2 by Herodotus) and epic poetry (Lat. Aeneid 1–6 by Vergil; Gr. Argonautica by Apollonius Rhodius). Given that I also wanted to investigate the distribution of SEA words in Ancient Greek and Latin, I selected parts of these texts depending on the final number of tokens. To maintain a balance between the Latin and Greek sub-corpora, some texts (Herodotus’s Histories and Vergil’s Aeneid) have not been fully annotated. Overall, my corpus has 174,501 tokens. The Greek sub-corpus constitutes 53% of the whole corpus, and it has 92,592 tokens (53,750 for prose and 38,842 for poetry). The Latin sub-corpus has 81,909 tokens (51,313 for prose and 30,596 for poetry).
(3) Dataset Description
The nouns, verbs, and adjectives included in this dataset are:
- NOUNS: AGr. thálassa, póntos, pélagos, háls, Lat. mare, pontus, pelagus, aequor ‘sea’; AGr. húdōr, Lat. aqua, lympha ‘water’; AGr. háls, Lat. sal ‘sea’, ‘salt’; AGr. kûma, Lat. unda, fluctus ‘wave’; Lat. litus, ripa ‘shore’;
- VERBS: AGr. pléō (and its preverbed forms occurring in the analyzed texts), Lat. navigo ‘sail’;
- ADJECTIVES: AGr. thalássios, póntios, Lat. marinus, maritimus ‘maritime, marine’.
In the CSV file, annotations are represented with ten columns and as many rows as the number of SEA tokens in each of the considered texts. Columns provide: (1) the token (TOKEN); (2) its morphological analysis (MORPHOLOGICAL FEATURES); (3) its lemma (LEMMA); (4) its part of speech (POS); (5) the sentence in which the token is found (PASSAGE); (6) the type of token meaning (literal, metaphorical, or metonymic), according to cognitive linguistics and the new WordNets for ancient Indo-European languages () (MEANING); (7) its meaning in context using synsets from the WordNets, preceded by a unique identifier (SYNSET); (8) the token ID (ID); (9) possible words (proper nouns or adjectives) in Ancient Greek or Latin to which a noun meaning ‘sea’ is referred (REFERS TO); (10) the meaning of the phrase resulting from (1) and (9), using synsets from the WordNets, preceded by their unique identifier (DENOTES). An excerpt of the dataset is given in Table 1.
TOKEN | MORPHOLOGICAL FEATURES | LEMMA | POS | PASSAGE | MEANING | SYNSET | ID | REFERS TO | DENOTES |
---|---|---|---|---|---|---|---|---|---|
ἁἁὸς | Case=Gen|Gender=Fem|Number=Sing | ἅλς | NOUN | ἔνθ᾽ ἄρα τοίγε ἑςπέριοι ἀνέμοιο παλιμπνοίῃςιν ἔκελςαν, καί μιν κυδαίνοντες ὑπὸ κνέϕας ἔντομα μήλων κεῖαν, ὀρινομένης ἁλὸς οἴδματι | Literal | ‘n#06781694 a large body of water constituting a principal part of the hydrosphere’ | 25434 | ||
πόντῳ | Case=Dat|Gender=Masc|Number=Sing | πόντος | NOUN | ἠῶθεν δ᾽ Ὁμόλην αὐτοςχεδὸν εἰςορόωντες πόντῳ κεκλιμένην παρεμέτρεον | Literal | ‘n#06781925 a division of an ocean or a large body of salt water partially enclosed by land’ | 25716 | ||
ἁλὸς | Case=Gen|Gender=Fem|Number=Sing | ἅλς | NOUN | λάρνακι δ᾽ ἐν κοίλῃ μιν ὕπερθ᾽ ἁλὸς ἧκε ϕέρεςθαι, αἴ κε ϕύγῃ | Literal | ‘n#06781694 a large body of water constituting a principal part of the hydrosphere’ | 26911 | ||
πόντον | Case=Acc|Gender=Masc|Number=Sing | πόντος | NOUN | ἀλλὰ γὰρ ἔμπης ἦ θαμὰ δὴ πάπταινον ἐπὶ πλατὺν ὄμμαςι πόντον δείματι λευγαλέῳ, ὁπότε Θρήικες ἴαςιν | Metonymic | ‘n#06783379 the part of the sea that can be seen from the shore’ | 27311 | ||
ἁλὶ | Case=Dat|Gender=Fem|Number=Sing | ἅλς | NOUN | περὶ γὰρ βαθυλήιος ἄλλων νήςων, Αἰγαίῃ ὅςαι εἰν ἁλὶ ναιετάουςιν | Metonymic | ‘n#06781925 a division of an ocean or a large body of salt water partially enclosed by land’ | 36044 | [‘Αἰγαίῃ’] | [‘n#06806923 an arm of the Mediterranean between Greece and Turkey; a main trade route for the ancient civilizations of Crete and Greece and Rome and Persia’] |
ἀναπλώοντι | Case=Dat|Gender=Masc|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Act | ἀναπλέω | VERB | εἰ δ᾽ οὔ μοι πέπρωται ἐς Ἑλλάδα γαῖαν ἱκέςθαι τηλοῦ ἀναπλώοντι, ςὺ δ᾽ ἄρςενα παῖδα τέκηαι | Literal | ‘v#01260993 travel by boat’ | 39277 | ||
ὕδωρ | Case=Acc|Gender=Neut|Number=Sing | ὕδωρ | NOUN | ἔνθ᾽ ἄρα τοίγε κόπτον ὕδωρ δολιχῇςιν ἐπικρατέως ἐλάτῃςιν | Literal | ‘n#10771040 water containing salts’ | 39681 | ||
ἅλα | Case=Acc|Gender=Fem|Number=Sing | ἅλς | NOUN | ὄϕρα δαέντες ἀρρήτους ἀγανῇςι τελεςϕορίῃςι θέμιςτας ςωότεροι κρυόεςςαν ὑπεὶρ ἅλα ναυτίλλοιντο | Literal | ‘n#06781694 a large body of water constituting a principal part of the hydrosphere’ | 39864 | ||
πόντου | Case=Gen|Gender=Masc|Number=Sing | πόντος | NOUN | κεῖθεν δ᾽ εἰρεςίῃ Μέλανος διὰ βένθεα πόντου ἱέμενοι τῇ μὲν Θρῃκῶν χθόνα, τῇ δὲ περαίην Ἴμβρον ἔχον καθύπερθε | Literal | ‘n#06781925 a division of an ocean or a large body of salt water partially enclosed by land’ | 40063 | [‘Μέλανος’] | [‘n#06810637 a sea between Europe and Asia; a popular resort area of eastern Europeans’] |
πέλαγος | Case=Acc|Gender=Neut|Number=Sing | πέλαγος | NOUN | πέλαγος δὲ τὸ μὲν καθύπερθε λέλειπτο ἦρι | Literal | ‘n#06783080 an especially deep part of a sea or ocean’ | 40294 | ||
ἅλα | Case=Acc|Gender=Fem|Number=Sing | ἅλς | NOUN | ἔςτι δέ τις αἰπεῖα Προποντίδος ἔνδοθι νῆςος τυτθὸν ἀπὸ Φρυγίης πολυληίου ἠπείροιο εἰς ἅλα κεκλιμένη | Literal | ‘n#06781694 a large body of water constituting a principal part of the hydrosphere’ | 40710 | ||
ὕδατος | Case=Gen|Gender=Neut|Number=Sing | ὕδωρ | NOUN | ἐν δέ οἱ ἀκταὶ ἀμϕίδυμοι, κεῖνται δ᾽ ὑπὲρ ὕδατος Λἰςήποιο | Metonymic | ‘n#06789983 a large natural stream of water (larger than a creek)’ | 40823 | ||
ἁλός | Case=Gen|Gender=Fem|Number=Sing | ἅλς | NOUN | ἠοῖ δ᾽ εἰςανέβαν μέγα Δίνδυμον, ὄϕρα καὶ αὐτοὶ θηήςαιντο πόρους κείνης ἁλός | Literal | ‘n#06781925 a division of an ocean or a large body of salt water partially enclosed by land’ | 42805 | ||
Object name
25+ SEA words morpho-semantically annotated in Ancient Greek and Latin.
Format names and versions
CSV
Creation dates
From 2023-07-07 to 2023-08-10
Dataset creators
Andrea Farina (Department of Digital Humanities, King’s College London): conceptualization, data curation, methodology, formal analysis, data retrieval.
Language
Ancient Greek, Latin, English
License
CC0
Repository name
Figshare
Publication date
2023-08-18
(4) Reuse Potential
Given that this dataset describes the semantics of different words pertaining to the semantic field of SEA in Ancient Greek and Latin, its first reuse potential deals with linguistics. First, the dataset can lead to both onomasiological and semasiological analyses. It can be expanded considering other works, authors, and literary genres, to have a broader overview of SEA words in Ancient Greek and Latin. Similar datasets may also be obtained for other semantic fields and/or languages, to allow for cross-linguistic comparisons either synchronically or diachronically. Moreover, this dataset could serve as the basis to train a model for automatic semantic annotation based on co-occurring words, that can be extracted from the passage in which a token occurs.
This dataset – or other similar datasets – may also be employed in literary-geographical studies, to evaluate, for instance, how a specific place, such as a sea, is referred to in different texts and/or geographical areas – synchronically or diachronically –, and whether the proper noun of a sea tends to occur alone or with one or more common nouns. This may cast some new light on geographical denominations in the ancient world. In this sense, it may also be used to expand already existing online resources, such as Pelagios (; ; ; ; ) or to add further historical depth to the World Historical Gazetteer (; ; ), grouping together places that were called with more than one name.
Finally, more broadly, cross-linguistic analyses conducted in a cognitive framework also allow for psycho-anthropological studies that can address questions such as: How many words did the Greeks and the Romans possess to express one or more concepts related to SEA? How and why does the number of SEA words vary in Greek and Roman texts? How can we account for similarities and differences in this sense? Does this reveal anything about these populations from the cultural point of view?