A Dataset of Self-Reported Attitudes to Afrikaans Swearwords

Gerhard B. van Huyssteen; Roald Eiselen; Jaco du Toit

Data Papers

A Dataset of Self-Reported Attitudes to Afrikaans Swearwords

Authors

Gerhard B. van Huyssteen
Roald Eiselen
Jaco du Toit

Abstract

Until recently, no research has ever been done on user attitudes to Afrikaans taboo language. To address this shortcoming, a multidisciplinary research project was initiated to investigate, among others, user attitudes to swearwords. Online single-word surveys (SWSs) for individual swearwords have been posted periodically on the project website.¹ Volunteer respondents are recruited through respondent-driven opportunistic sampling and snow-ball sampling via social media. Respondents firstly give their informed consent, and then once-off provide some sociodemographic information. Thereafter, each swearword is judged on at least seven attitudinal dimensions. All data are stored in a relational database, and then extracted to create a single UTF-8 encoded CSV file. The dataset holds great potential for perusal in numerous language-specific (i.e., Afrikaans) sociopragmatic and/or sociolinguistic investigations and applications, as well as for comparative linguistic research and general statistical modelling.

Keywords:

Year: 2023

Volume 9

Page/Article: 14

DOI: 10.5334/johd.127

Submitted on Aug 4, 2023

Accepted on Sep 15, 2023

Published on Oct 4, 2023

Peer Reviewed

CC Attribution 4.0

(1) Overview

Repository location

DaYta Ya Rona: https://doi.org/10.25388/nwu.23708229.

Context

Research on swearing, offensive and taboo language has been an active area of research for many years in a variety of scientific contexts, including computational linguistics, psychology, sociology, and various subdisciplines of linguistics – see Stapleton et al. () for a recent overview. While the majority of scientific literature focuses on English, various studies have also been undertaken for other languages, including Cantonese, Danish, Dutch, Finnish, French, Italian, Japanese, Latin, and Russian. In the South African context, and specifically for Afrikaans, relatively little research has been done in this research area, bar some research focussing on the lexicographic handling of swearwords (; ), language acquisition (), language change (), lexicology and onomastics (, , ; ; ), sociolinguistics (; ; ), and grammatical aspects of swearing (; ; ; , ). Until recently, no research has ever been done on user attitudes to Afrikaans taboo language.

To address this shortcoming, a multidisciplinary research project – What the Swearword! – was initiated to investigate various aspects of taboo language in Afrikaans and other languages in its ecosystem (). An important part of the project is the collection of empirical data related to, among others, the prototypicality of swearwords (; ), attitudes to parental control (, ), and user attitudes to swearwords (). The methodology and resultant dataset of the latter is the focus of this article.

(2) Method

Steps

To collect data on self-reported attitudes to swearwords, short online surveys for individual words have been posted periodically on the project website and advertised via social media platforms. All respondents must firstly register for free as users on the project website. During the registration process, respondents firstly give their informed consent, and must then once-off provide some sociodemographic information, translated and summarised in Table 1. These sociodemographic factors and their values have been informed by the above-mentioned previous studies, as well as other sociopragmatic studies of offensive words, where one or more of these factors have been statistically correlated with usage of and attitudes to such words (see , , ), and Beers Fägersten (); Beers Fägersten and Stapleton (); Beers Fägersten and Stapleton () especially). A summary of the sociodemographic responses of the survey participants is available in the data repository as part of the dataset.

Table 1

Summary of sociodemographic factors.


SOCIODEMOGRAPHIC FACTORS	DATA TYPE	OPTIONS

Age group	Ordinal (3)	18–39; 40–59; 60+

Sex	Nominal (2)	Male; Female

Population group	Nominal (4)	Black; Coloured; Indian; White1

Length	Ordinal (7)	>199; 190–199; 180–189; 170–179; 160–169; 150–159; <150

Mother’s primary language	Nominal (12)	List of South Africa’s eleven official languages, plus Dutch

Father’s primary language	Nominal (12)	List of South Africa’s eleven official languages, plus Dutch

Language used primarily with family	Nominal (13)	List of South Africa’s eleven official languages, plus Dutch, as well as a bilingual (Afrikaans and English) option

Language used primarily with friends	Nominal (13)	List of South Africa’s eleven official languages, plus Dutch, as well as a bilingual (Afrikaans and English) option

Language primarily used for work	Nominal (13)	List of South Africa’s eleven official languages, plus Dutch, as well as a bilingual (Afrikaans and English) option

Languages proficient in	Nominal (12)	List of South Africa’s eleven official languages, plus Dutch

Identification with a geolect	Nominal (2)	Yes; No. If “yes”, then the respondent gets a list of typical Afrikaans geolects to choose from, or to specify their own geolect.

Country of residence	Nominal (12)	South Africa, with a specification for one of the nine provinces; Namibia; Belgium; The Netherlands

Period in country of residence	Ordinal (4)	>6 years; 4–6years; 1–3years; <1 year

Country of childhood	Nominal (12)	South Africa, with a specification per one of the nine provinces; Namibia; Belgium; The Netherlands

Highest qualification	Nominal (10)	List of typical kinds of qualification in South Africa

Income group	Ordinal (7)	List of typical categories

Identification with a gender group	Nominal (2)	Yes; No. If “yes”, then the respondent can specify their own gender group.

Religiousness as a child/teenager	Nominal (5)	Very religious; Religious; Somewhat religious; Not really; Not at all

Religiousness currently	Nominal (5)	Very religious; Religious; Somewhat religious; Not really; Not at all

Political views	Nominal (5)	Very conservative; Conservative; Moderate; Liberal; Very liberal

World view (pertaining to moral and social issues)	Nominal (5)	Very conservative; Conservative; Moderate; Liberal; Very liberal

To gather responses on participants’ attitudes towards different words, an online single-word survey (SWS) template was designed. In each SWS, only one swearword is presented to respondents, in an attempt to prevent so-called “respondent fatigue” – a well-documented phenomenon that occurs when survey participants become tired of the survey task, and the quality of the data they provide begins to deteriorate (). The assumption is that one would cover more words over a period of time, than if one were to present the same number of words to participants in a single session ().

One very significant challenge of this SWS approach is that the responses for the different words are not being collected during a single session by the same respondents. For example, the SWS for word X could have been completed by 200 respondents of the more than 2,000 registered users, while the SWS for word Y a week later by only 120 respondents – with only some (if any) overlap between these two SWSs.

Table 2 provides a summary of the words in the data set along with the number of respondents who completed the survey for each word.

Table 2

Summary of swearwords and number of respondents for each word.


WORD	TOTAL RESPONSES	RESPONSES WITH COMPLETE METADATA	WORD	TOTAL RESPONSES	RESPONSES WITH COMPLETE METADATA

asshole	8	7	jirre	13	10

ballas	12	10	jissis	184	152

bebliksemd	189	155	kak	197	167

bedonderd	147	123	kerriekop	25	19

befok	13	9	kont	48	38

bekak	11	8	kots	163	135

blerrie	31	18	magtig	12	12

bliksem	19	16	ma-se-poes	15	11

bliksems	104	88	moer	20	16

boudservette	194	155	moerskont	12	9

demmit	12	11	moffie	222	180

donder	12	11	naai	23	20

doos	34	24	naaier	18	15

drol	14	11	piel	29	18

eiers	7	6	piele	208	167

etter	22	13	pis	9	9

feeks	184	147	poep	10	9

flerrie	77	67	poephol	39	30

flippen	133	114	poes	26	19

fok	21	16	rooikop	125	104

fokken	208	174	shit	21	16

fokker	55	44	skyt	130	109

fokkit	45	31	slet	18	14

fokkof	9	5	slymkonyn	169	138

fokkol	16	13	stront	9	8

foktog	18	15	swerkater	20	17

frieken	158	126	swernoot	178	147

fuck	16	12	teef	10	7

gat	5	3	tos	159	126

god	26	22	tril	5	4

gots	154	125	voëlverklikker	150	120

hel	16	15	wetter	6	6

helleveeg	130	108	wolgordyn	129	110

hoer	29	19	wortelkop	222	176

hol	10	9

Each word is judged on at least seven dimensions relating to a respondent’s attitude to the word; an eighth dimension only pertains to some words where the sex of the referent might be relevant (e.g., whether a word like soutie ‘English person’ can be used to refer to men and women alike). These dimensions and their corresponding questions are translated and listed in Table 3. For each dimension, a respondent must assign a value between 1 and 9, where only the two extreme values of the scale are labelled.

Table 3

Response dimensions.


DIMENSION	QUESTION	END-POINT LABELS

Production frequency	How often do you say or write the word?	Never … Very often

Perception frequency	How often do you hear or read the word? (E.g., in conversations, on the radio or TV, in magazines or books, on the internet, etc.)	Never … Very often

Offensiveness (self)	How offensive do you find the word personally?	Not at all … Very

Tabooness (others)	How taboo or socially unacceptable is the word for people in general? (E.g., in a workspace, classroom, at a party with friends, family, and colleagues)	Not at all … Very

Emotionality	What emotional charge does the word have for you?	Very negative … Very positive

Conspicuousness	How conspicuous is the word? (To what degree does it grab your attention?)	Not at all … Very

Familiarity	How well do you know what the word means?	Not at all … Very well

Sex of referent	Can the word be used to refer only to men, to men and women, or only to women?	Women only … Men only

All data are stored in a relational database, and then extracted to create a single UTF-8 encoded CSV file. Each line in the file has 54 columns consisting of the swearword, the respondent’s unique identifier, the responses of the respondent to the word, and the sociodemographic information of the respondent in both ordinal and text format.

Sampling strategy

Given the fact that the aim of the project is not to collect data specifically for decision making, but rather sociopragmatic description of swearwords, it is not as important to target fully stratified respondent samples. Consequently, non-probability sampling of respondents is a valid approach where volunteer respondents are recruited through respondent-driven opportunistic sampling, as formalised by Heckathorn (), and snow-ball sampling via social media (). These techniques have the potential advantage of including so-called “hidden populations”, or respondents that would not otherwise participate in research projects dealing with taboo topics and swearwords.

(3) Dataset Description

Object name – Afrikaans swearword scores

Format names and versions – UTF-8 encoded CSV version 1.0

Creation dates – 2019/07/01 – 2023/05/31

Dataset creators

Gerhard B. van Huyssteen (Organisation, Design, Collection, Quality Control), North-West University

Cornelius van der Walt (Website development, Data processing), BlueTek Computers

Jaco du Toit (Data processing), North-West University

Roald Eiselen (Data processing), North-West University

Nico Oosthuizen (Data processing), Independent

Language – Afrikaans (af)

License – Creative Commons Attribution 4.0 International

Repository name – DaYta ya Rona

Publication date – 2023–07–19.

(4) Reuse Potential

Since this is the first empirical dataset ever on user perceptions of Afrikaans swearwords, the dataset holds great potential for perusal in numerous language-specific (i.e., Afrikaans) sociopragmatic and/or sociolinguistic investigations. For example, the data can be used to compare specific words within the same domain, like what Van Huyssteen and Eiselen () have done for the words feeks (“shrew”) and helleveeg (“harridan”), or across semantic domains (e.g., a comparison of words from the sex domain with words from the religious domain, etc.). On the other hand, the dataset could be used fruitfully in investigating sociodemographic predictors of tabooness, offensiveness, and the like.

Given that the sociodemographic factors and their values are based on well-known international research, the dataset could also be used in comparative linguistic research. While specific words could not necessarily be compared across languages, semantic domains or taboo types (like blasphemies, slurs, or epithets) could be compared. It would, of course, be easier to do such comparisons with Germanic languages, e.g., with the data of Van Sterkenburg () for Dutch, or Beers Fägersten () for Danish.

From a statistical point of view, the data could be used in the modelling of problematic or challenging data. For example, one of the shortcomings of the dataset is the large variation in number of respondents per swearword, ranging from moffie (“gay man”) with 188 responses with complete metadata, to gat (“buttocks”) with only 3 comparable responses (see Table 2). The validity and reliability of data collected over a period of time by means of SWSs, should also be compared to data collected in a single, longer survey.

Lastly, the dataset could also be utilised for practical, applied purposes. For example, it is currently being used in the so-called Vloekmeter (‘swearing meter’; see vloek.co.za/vloekmeter). The Vloekmeter is purely data-driven: Based on this dataset, statistics are presented on an interactive dashboard on the website (see Figure 1). Such an application can be of practical use not only for content creators (like authors, and film makers), but especially also for publishers, broadcasting companies (like Netflix), or the South African Film and Publication Board that might want to provide age and content advisories for books, television series, films, and computer games.

Figure 1

Vloekmeter showing results for fokken (“fucking”) and flippen (“fricking”).

Notes

https://vloek.co.za (last accessed: 19 September 2023).
https://dayta.nwu.ac.za/ (last accessed: 19 September 2023).
All these questions have two additional options not counted and listed in the table, viz. (a) “I don’t want to answer this question”; and (b) “Other/Something else” (not applicable to Age group; Length; Period in country of residence).
These values are in accordance with terminology in South African legislation dealing with population groups.

Ethical clearance for the research project was obtained through the Language Matters Ethics Committee of the NWU (ethics number: NWU-00632-19-A7).

Acknowledgements

We would like to acknowledge Cornelius van der Walt (BlueTek Computers), and Jaco du Toit (NWU) for their technical and creative work in the implementation of the Vloekmeter. A comprehensive list of all the co-workers, collaborators, and students on the project is published on vloek.co.za/oor-ons.

None of the results and/or opinions in this paper can be ascribed to any of the people or organisations mentioned above.

Funding Information

This research is partially funded by the Suid-Afrikaans Akademie vir Wetenskap en Kuns, and partially made possible through barter agreements with BlueTek Computers, Afrikaans.com, and WatKykJy.co.za. The Woordeboek van die Afrikaanse Taal (WAT), Handwoordeboek van die Afrikaanse Taal (HAT) and Centre for Text Technology (CTexT) of the North-West University (NWU) are hereby also acknowledged for generously supplying the project with material from their respective databases.

Competing interests

The first author is a director of the not-for-profit company Viridevert NPC (CIPC registration number: 2016/411799/08), who owns and manages the website vloek.co.za. This website was developed specifically for this project, and this conflict of interest has been approved by the NWU.

Author Contributions

Gerhard B. van Huyssteen: Conceptualisation (lead); Data curation (support); Funding acquisition; Investigation (equal); Methodology (lead); Project administration; Writing – original draft (support); Writing – review and editing (lead).

Roald Eiselen: Conceptualisation (support); Data curation (lead); Formal analysis; Investigation (equal); Methodology (support); Software (lead); Visualisation; Writing – original draft (lead); Writing – review and editing (support).

Jaco du Toit: Software (support); Data curation (support); Writing – review and editing (support).

References

Beers Fägersten, K. (2007). A sociolinguistic analysis of swearword offensiveness. https://www.researchgate.net/publication/265009714_A_sociolinguistic_analysis_of_swearword_offensiveness
Beers Fägersten, K. (2012). Who’s Swearing Now? The Social Aspects of Conversational Swearing. Cambridge Scholars Publishing.
Beers Fägersten, K., & Stapleton, K. (Eds.) (2017). Advances in Swearing Research: New Languages and New Contexts (Vol. 282). John Benjamins. DOI: https://doi.org/10.1075/pbns.282
Beers Fägersten, K., & Stapleton, K. (2022). Swearing. Handbook of Pragmatics, 25, 129–155. DOI: https://doi.org/10.1075/hop.25.swe1
Calitz, F. C. (1979). Spot, skel en verwante verskynsels in Afrikaans [Mockery, swearing and related phenomena in Afrikaans]. Stellenbosch University.
Coetzee, F. (2018). Hy leer dit nie hier nie (‘He doesn’t learn it here’): talking about children’s swearing in extended families in multilingual South Africa. International Journal of Multilingualism, 15(3), 291–305. DOI: https://doi.org/10.1080/14790718.2018.1477291
De Klerk, V. (2008). How taboo are taboo words for girls? Language in Society, 21(2), 277–289. DOI: https://doi.org/10.1017/s0047404500015293
De Klerk, V. A., & Antrobus, R. (2004). Swamp-donkeys and rippers: the use of slang and pejorative terms to name ‘the other’. Alternation, 11(2), 264–282. DOI: https://doi.org/10.10520/AJA10231757_348
Dekker, L. (1991). Vloek, skel en vulgariteit: Hantering van sosiolinguisties aanstootlike leksikale items [Swearing, name-calling and vulgarity: Treatment of sociolinguistically offensive lexical items]. Lexikos, 1, 52–60. DOI: https://doi.org/10.5788/1-1-1148
Feinauer, A. E. (1981). Die taalkundige gedrag van vloekwoorde in Afrikaans [The linguistic behaviour of swearwords in Afrikaans]. Stellenbosch University.
Heckathorn, D. D. (1997). Respondent-Driven Sampling: A New Approach to the Study of Hidden Populations. Social Problems, 44(2), 174–199.
Jay, T. B. (1992). Cursing in America: A psycholinguistic study of dirty language in the courts, in the movies, in the schoolyards and on the streets. John Benjamins.
Jay, T. B. (2000). Why we curse: A neuro-psycho-social theory of speech. John Benjamins.
Jay, T. B. (2020). Ten issues facing taboo word scholars. In N. Nassenstein & A. Storch (Eds.), Swearing and Cursing (pp. 37–52). De Gruyter Mouton. DOI: https://doi.org/10.1515/9781501511202-002
Lavrakas, P. J. (2008). Respondent Fatigue. In P. J. Lavrakas (Ed.), Encyclopedia of Survey Research Methods (Vol. 1). Thousand Oaks: Sage Publications.
Lubbe, H. J. (1969). Woordtaboe en die sogenaamde bastervloeke [Word taboo and the so-called minced-oaths]. Tydskrif vir Volkskunde en Volkstaal, 2.
Lubbe, H. J. (1970). Die eufemisme in Afrikaans [The euphemism in Afrikaans]. Universiteit van die Oranje-Vrystaat.
Lubbe, H. J. (1971). Die eufemisme en die pejoratiewe betekenisontwikkeling van woorde [The euphemism and the pejorative meaning change of words]. Tydskrif vir Volkskunde en Volkstaal, 27(2), 1–8. https://collections.nwu.ac.za/dbtw-wpd/textbases/bibliografie-afrikaans/documents-dbat/volksk&volkstaal_apr1971_1-8.pdf
Lubbe, H. J. (1973). Eufemismes van fatsoenlikheid [Euphemisms of genteelness]. Tydskrif vir Volkskunde en Volkstaal, 29(1), 10–14. https://collections.nwu.ac.za/dbtw-wpd/textbases/bibliografie-afrikaans/documents-dbat/volksk&volkstaal_jan1973_10-14.pdf
Pienaar, P. d. V. (1945). Spraaktaboe en eufemismes in Afrikaans [Speech taboo and euphemisms in Afrikaans]. Tydskrif vir Volkskunde en Volkstaal, 2(11), 44–52. (Reprinted in Nienaber, P. J. 1965. Taalkundige opstelle [Linguistic essays]. A. A. Balkema. Pp. 68–80.)
Smuts, J. (1958). Skel- en spotname in Afrikaans [Epithets and nicknames in Afrikaans]. Tydskrif vir Volkskunde en Volkstaal, 14(3), 25–26.
Stapleton, K., Beers Fägersten, K., Stephens, R., & Loveday, C. (2022). The power of swearing: What we know and what we don’t. Lingua, 277. DOI: https://doi.org/10.1016/j.lingua.2022.103406
Trollip, E. B. (2022). Morfologiese evalueringskonstruksies in Afrikaans [Morphological evaluation constructions in Afrikaans]. North-West University.
Van der Merwe, M.-M. (2022). Prototipiese Afrikaanse taboewoorde [Prototypical Afrikaans taboo words]. University of Pretoria.
Van der Merwe, M.-M., Pilon, S., Van Huyssteen, G. B., & Du Toit, J. (2022). Prototipiese Afrikaanse taboewoorde: Datastel (Weergawe 1.0) [Prototypical Afrikaans taboo words: Dataset (Version 1.0)]. University of Pretoria and North-West University. DOI: https://doi.org/10.10.25403/UPresearchdata.21975719
Van der Walt, A. (2019). Linguistiese eienskappe en konvensionalisering in Zefrikaans op die WatKykJy?-blog: ’n korpuslinguistiese ondersoek [Linguistic features and conventionalisation in Zefrikaans on the WatKykJy? blog: a corpus linguistic study]. North-West University.
Van Huyssteen, G. B. (1996). The sexist nature of sexual expressions in Afrikaans. Literator, 17(3), 119–135.
Van Huyssteen, G. B. (1998). Die leksikografiese hantering van seksuele uitdrukkings in Afrikaans [The lexicographic treatment of sexual expressions in Afrikaans]. South African Journal of Linguistics, 16(2), 63–71.
Van Huyssteen, G. B. (2021, 29 November – 3 December). Swearing in South Africa: Multidisciplinary research on language taboos. Proceedings of the International Conference of the Digital Humanities Association of Southern Africa 2021, South Africa (Virtual). https://gerhard.pro/publications/vanhuyssteen2021a/
Van Huyssteen, G. B. (2022). Wat ons van ‘fok’ weet (en nie weet nie) [What we (don’t) know about fuck]. LitNet Akademies (Geesteswetenskappe), 19(3), 428–452. DOI: https://doi.org/10.56273/1995-5928/2022/j19n3b11
Van Huyssteen, G. B., & Eiselen, R. (2021). Oor feekse en helleveë [On shrews and harridans]. Tydskrif vir Geesteswetenskappe, 61(4–1), 1129–1155. DOI: https://doi.org/10.17159/2224-7912/2021/v61n4-1a9
Van Huyssteen, G. B., Rabé, M., & Puttkammer, M. J. (2023a). Datastel: Ouderdoms- en inhoudsadvies vir Afrikaanse boeke vir kinders [Dataset: Age and content advisories for Afrikaans books for children]. DOI: https://doi.org/10.25388/nwu.22256932
Van Huyssteen, G. B., Rabé, M., & Puttkammer, M. J. (2023b). Ouderdoms- en inhoudsadvies vir Afrikaanse boeke vir kinders: resultate van ’n eerste kwalitatiewe en kwantitatiewe ondersoek [Age and content advisories for Afrikaans children’s books: Results of a first qualitative and quantitative investigation]. LitNet Akademies (Geesteswetenskappe), 20(1), 185–212. DOI: https://doi.org/10.56273/1995-5928/2023/j20n1b7
Van Sterkenburg, P. G. J. (2019). Rot lekker zelf op: Over politiek incorrect en ander ongepast taalgebruik. Scriptum.

Data Papers

A Dataset of Self-Reported Attitudes to Afrikaans Swearwords

Abstract

(1) Overview

Repository location

Context

(2) Method

Steps

Sampling strategy

(3) Dataset Description

Dataset creators

(4) Reuse Potential

Notes

Ethics and Consent

Acknowledgements

Funding Information

Competing interests

Author Contributions

References