(1) Overview

Repository location

Zenodo; DOI: https://doi.org/10.5281/zenodo.10785192

Context

This paper was produced as part of an ongoing research project with the Croatian acronym PoKUS, which translates to Remembering Literature in Everyday Life (). The project’s rationale and details can be traced on its website and in several papers written by its principal investigator (e.g., ), but its overarching aim may succinctly be stated as giving a voice to the 99% percent of non-professional literary readers. They have historically been underrepresented compared to the 1% of professional readers, such as writers, critics, editors, professors, teachers, and others whose memories and opinions mostly formulate our collective memory of literature.

In this way, the project and this paper fit broadly into calls for researching literature in everyday life (e.g., ), particularly by investigating what remains after reading a work of literature (see also ). What sets them apart from other similar studies (e.g., ; ; ) is primarily the large qualitative sample, which is comparable to some quantitative studies of non-professional readers (e.g., ; ). They also represent memories from a small European language and literary culture, which provides them with an additional dimension of (quite literally) giving voice to readers who are not often heard from in a global context.

(2) Method

All the data was gathered from individual semi-structured interviews performed by the three authors in Croatia(n) with volunteers recruited between March 2021 and January 2023 (N = 1,005).

Steps

After the respondents read and signed informed consent forms, audio recordings of the interviews were made on the researchers’ computers following a predetermined list of questions (available in the “Dataset explanation” document). The researchers then entered the data and metadata from the interviews into a single database using Microsoft Excel.

Sampling strategy

The respondents were approached and recruited by the researchers directly in about 15 towns and cities in Croatia, mostly in public places connected to reading activities such as libraries (predominantly), book clubs, and book cafés. This convenience sample was modified by one positive criterium – at least 18 years of age – and one negative criterium – not being presently employed in a job that requires reading literary texts – to arrive at a generalizable sample of non-professional readers of literature in Croatia.

(3) Dataset Description

Repository name

Zenodo

Object name

PoKUS

Format names and versions

Excel, Word

Creation dates

2021-03-01–2024-01-24

Dataset creators

Lovro Škopljanac, Luka Ostojić, Velna Rončević

Language

Croatian (for interview snippets and several categories), English (for column headings and several categories), several others (for direct quotations)

License

CC-BY-4.0

Publication date

2024-03-06

The Excel dataset is divided into six sections. The first one contains the readers’ demographic data, like their age and sex. The second one provides facts about the literary texts, like their titles and literary. The third provides similar data about their authors, like the years of birth and text count per author. The fourth and fifth sections are lists of the texts sorted by reader, divided into those discussed extensively, and those only mentioned as a passing reference. The final section consists of quantitative recollections, such as the times and locations of reading. A restricted version of this dataset, expanding on the sixth section with qualitative data which may not be publicly made available due to potential ethical concerns, is also explained, and linked to.

(4) Reuse Potential

This dataset provides a synchronic overview, or a “snapshot”, of the reading culture, cultural memory, and conceptualization of literature by its contemporary readers in one European country and language. Correspondingly, its authors believe it has a high reuse potential for researchers interested in any of those three areas. Similar large-scale qualitative surveys of readers are quite rare, and reliable data about reader reception is hard to come by, so this dataset provides a new kind of resource.

This resource (especially in the restricted version) is compatible with extant theories and data in reader reception, including research into specific texts, authors, literary periods, and characters. It may also lend itself to broader considerations, such as (imagological) investigations of how literary topics and motifs are perceived in certain periods and genres, as well as the more general insight into the differences between professional and non-professional readers. The data on readers may be of particular import here for anyone interested in particular aspects of the sample dissected by general demographic criteria such as sex, age, and education level (e.g., how reading memories of older well-educated female readers differ from their younger counterparts).

To illustrate the reuse potential further, here are three ideas produced by outside researchers who were asked to comment on the dataset prior to publication. The first one (with a background in digital humanities) proposed to start with data on literary (sub)genres, which is limited to about a dozen items in a drop-down list in the extant database, and to expand it by scrapping (sub)genre data from publicly available websites such as libraries and reading social networks. The second (with a background in book history) was a researcher investigating how individuals organize their private book reading spaces, and she proposed to reference the data, because readers routinely spoke about their private libraries. The last one (with a background in psychology) noted how the data may broadly be used for validation purposes: research on cognitive processes and/or traits such as literary absorption, transportation, or enjoyment, may be corroborated or disproved by the readers’ unsolicited statements.

As for potential constraints on reuse, it needs to be noted that the dataset does not include the original “raw” data in the form of interview recordings or transcripts, which would provide even more data. The former may not be distributed due to the need to protect the anonymous participants’ privacy, while the latter (which they have agreed to share with researchers) are not currently available due to the onerous process of (automated) recording transcription. Another potential problem is the language barrier, as the interviews were conducted in Croatian, which is not widely spoken. To amend that, the researchers will continue their work on correcting machine-generated transcripts (only 30 are available in English as of March 2024), and add them to the restricted dataset once available, which will also open the possibility to generate their translations into other languages.