(1) Overview
Repository location
Open Science Framework. “Absorption in Online Book Reviews”, https://osf.io/kr4v6/ (DOI: 10.17605/OSF.IO/KR4V6).
Context
Online book reviews are a relatively new form of reader testimonials that researchers from different disciplines can use to investigate reading experience and evaluation. The research on online book reviews has remained largely theoretical (e.g., ; ; ; ), or when empirical methods were employed, they involved bottom-up, data-driven approaches (e.g., ; ), or a top-down quantitative approach () to answer research questions about the reception of one particular text. The main aims of the present project were to validate the Story World Absorption Scale (SWAS; ) – a self-report instrument to capture experiences of absorption during literary reading – against unprompted reader testimonials found on Goodreads and to build a corpus of online book reviews that could be used for meaningful corpus linguistic as well as qualitative analyses of reader responses, emphasising absorption experience (cf. ).
(2) Method
Building the corpus
We scraped approximately six million English language reviews of nine different genres (i.e., fantasy, romance, thriller, horror, mystery, science fiction, historical fiction, contemporary, classics) from the Goodreads website between the spring of 2018 and the spring of 2019. We selected a subset of reviews to annotate from this larger corpus, taking into account text length and reviewer rating. We eliminated reviews with GIFs and used a word list based on our annotation guidelines to select reviews with a high absorption potential (for more information, see ; ). We then manually annotated this subset of reviews with a group of five annotators using our guidelines developed for this project. Our final corpus consists of 493 curated reviews. After annotation and curation, we added the following metadata to our final corpus: title of the book, author of the book, genre of the book (as voted for by Goodreads members), text length of the review in characters and tokens, annotated segment, annotation categories (book-specific mention of absorption versus mention of absorption in general reading behavior; presence or negation of absorption; absorption dimensions; specific absorption categories), off- and onset of the annotated segment, and annotation round. We have added a ReadMe file to the OSF page where researchers can obtain more information about each of the metadata variables included in the dataset.
Due to copyright and data privacy restrictions to the type of data we are working with, we decided to prepare two versions of our corpus: one with metadata, but without the full review texts (this version is available under the OSF-link presented in this paper). The other version does include the full text reviews, which have been fully anonymized. Researchers who are interested in working with this version of the corpus can contact the first author of this paper to obtain access to the complete corpus for research purposes only. We followed the APA guidelines on protected access open data in this decision ().
Developing the annotation guidelines
The annotation guidelines were developed throughout the annotation process which started in March 2019 and was divided into 15 rounds, the last of which was completed in October 2020 (for a thorough description of the annotation process, see ). The 18 statements on the Story World Absorption Scale were taken as the point of departure for the annotation guidelines. Throughout the annotation process, we simplified the language used in these statements to better match the language use of Goodreads reviewers and we added 17 annotation categories to the tagset, partly based on further research (i.e., ) and partly data-driven by what we found in the reviews. To complete the guidelines, we added examples from the reviews for all of the different absorption categories when we could find them. These examples can help other researchers who want to use the tagset to familiarise themselves with the idiosyncratic language found on digital social reading platforms (cf. ; ). Table 1 shows all of the annotation categories and how many times they were used to annotate segments of reviews. The guidelines also contain information about how to assign all aspects of a given annotation category, such as whether a category is present or negated, whether it was an instance of book-specific absorption or a description of how a reader usually experiences absorption, regardless of the book reviewed. Furthermore, the guidelines include comments about the differences between categories that are closely related to one another.
ABSORPTION DIMENSION | ABSORPTION CATEGORY | NUMBER OF TOTAL ANNOTATIONS | ABSORPTION PRESENT | ABSORPTION NEGATED |
---|---|---|---|---|
Attention | A1 (Altered sense of time): While reading time moved differently | 3 | 3 | 0 |
A2 (Concentration): My attention was focused on the book | 12 | 10 | 2 | |
A3 (General sense of absorption): I was absorbed in the book | 194 | 186 | 8 | |
A4 (No distractions): I was not distracted while reading | 5 | 3 | 2 | |
A5 (Forgetting surroundings): While reading I forgot the world around me | 20 | 16 | 0 | |
A6 (Anticipation): I was on the edge of my seat/I wanted to know what would happen next | 111 | 108 | 3 | |
A7 (Inability to stop reading): I did not want to put the book down/I could not put the book down | 150 | 145 | 5 | |
Emotional Engagement | EE1 (Perspective taking): I could imagine what it must be like to be this character | 35 | 35 | 0 |
EE2 (Sympathy): I sympathized with this character | 57 | 53 | 4 | |
EE3 (Emotional connection): I felt a connection to this character | 79 | 69 | 10 | |
EE4 (Empathy): I felt how this character was feeling | 73 | 72 | 1 | |
EE5 (Compassion for story events): I felt for what happened in the story | 79 | 79 | 0 | |
EE6 (Anger): I felt angry at this character | 20 | 19 | 1 | |
EE7 (Fear): I felt scared for this character | 5 | 5 | 0 | |
EE8 (Emotional familiarity): I felt like I knew this character | 11 | 11 | 0 | |
EE9 (Wishful identification): I wish I could be more like this character | 8 | 8 | 0 | |
EE10 (Emotional understanding): I understood why this character did this | 31 | 26 | 5 | |
EE11 (Parasocial response): I want to have some kind of relationship with this character | 79 | 79 | 0 | |
EE12 (Participatory response): I wanted to involve myself in the story world events | 42 | 42 | 0 | |
Mental Imagery | MS1 (Imagery of character): I could imagine what the characters looked/smelled/felt/sounded like | 18 | 15 | 3 |
MS2 (Imagery of story events): I could see/hear/feel/smell the story events clearly in my mind | 20 | 20 | 0 | |
MS3 (Imagery of story world): I could imagine what the story world looked/smelled/felt/sounded like | 27 | 25 | 2 | |
MS4 (Realness): The character/story world felt real to me | 73 | 73 | 0 | |
Transportation | T1 (Presence): While reading this I was in the story world | 13 | 13 | 0 |
T2 (Merge of fiction in reality): Elements from the story world came into my world | 14 | 14 | 0 | |
T3 (Proximity of story world): The story world felt close to me | 4 | 4 | 0 | |
T4 (Deictic shift): I felt transported to the story world | 19 | 19 | 0 | |
T5 (Part of the story world): I felt part of the story world | 34 | 34 | 0 | |
T6 (Return deictic shift): I returned from a trip to the story world | 3 | 3 | 0 | |
T7 (Travel in story world): I lost myself in the story world/I traveled with the characters through the story world | 26 | 26 | 0 | |
Impact | IM1 (Effortless engagement): It was an easy read/I devoured this book | 108 | 68 | 40 |
IM2 (Wish to reread): I will/have reread this book/parts of this book | 116 | 112 | 4 | |
IM3 (Anticipation book series): I cannot wait to see how this unfolds in the next book | 170 | 167 | 3 | |
IM4 (Addiction): I am addicted to this book/I cannot get enough of this book | 91 | 89 | 2 | |
IM5 (Lingering story feelings): The book left me feeling …/This book stayed with me for a while | 194 | 192 | 2 | |
(3) Dataset Description
Object name
The AbsORB (Absorption in Online Reviews of Books) Corpus and Annotation Guidelines.
Format names and versions
Corpus: .csv and .xlsx
Guidelines: .pdf
Creation dates
2018-12-01 — 2023-06-15
Dataset creators
Moniek Kuijpers, University of Basel
Simone Rebora, Johannes Gutenberg University Mainz
Piroska Lendvai, Bavarian Academy for Sciences and Humanities
Massimo Lusetti, University of Basel
Lina Ruh, University of Basel
Lukas Renner, University of Basel
Jonathan Tadres, University of Basel
Johanna Vogelsanger, University of Basel
Tina Ternes, University of Basel, and Johannes Gutenberg University Mainz
Language
Data and metadata: English
Licence
CC-By Attribution 4.0 International
Repository name
Open Science Framework
Publication date
2023-06-01
(4) Reuse Potential
There are many avenues researchers can take in terms of future research. The corpus can be expanded upon with more reviews from different platforms to see whether language use is different from one online community to the next, or with more metadata, such as the number of responses to a review or the number of times a review is read, which would allow for analyses focused on the social aspects of online book reviews. Another expansion could lie in adding reviews with low ratings, to enable analyses on differences in absorption experiences between low-rated and high-rated books. Relatedly, one could look into the annotations per genre and whether readers of different genres mention different absorption dimensions in their reviews. One such study, for which we provided access to the full-text corpus, used network analysis and found preliminary results that point to a strong similarity in vocabulary within romance reviews and within mystery reviews, suggesting that the reading experience of these groups of readers follow a more stable pattern, compared to other genres (i.e., fantasy, science fiction, horror/thriller) where no strong genre clusters were found ().
When it comes to the guidelines, other avenues may be explored. For example, the guidelines could be used to annotate a set of different reviews focusing on specific books or genres or they could be used on reviews from different platforms, or even different types of reader responses, such as open survey questions or interview transcripts. Currently a pilot study is being conducted in which the co-occurrence of absorption and changes in the self-concept in reviews on climate fiction is investigated (). Another avenue that is currently being explored is the translation and development of a German language corpus and guidelines in order to investigate whether certain absorption experiences can be, culturally and linguistically, translated to a different language community ().