(1) Overview

Repository location

Open Science Framework. “Absorption in Online Book Reviews”, https://osf.io/kr4v6/ (DOI: 10.17605/OSF.IO/KR4V6).


Online book reviews are a relatively new form of reader testimonials that researchers from different disciplines can use to investigate reading experience and evaluation. The research on online book reviews has remained largely theoretical (e.g., ; ; ; ), or when empirical methods were employed, they involved bottom-up, data-driven approaches (e.g., ; ), or a top-down quantitative approach () to answer research questions about the reception of one particular text. The main aims of the present project were to validate the Story World Absorption Scale (SWAS; ) – a self-report instrument to capture experiences of absorption during literary reading – against unprompted reader testimonials found on Goodreads and to build a corpus of online book reviews that could be used for meaningful corpus linguistic as well as qualitative analyses of reader responses, emphasising absorption experience (cf. ).

(2) Method

Building the corpus

We scraped approximately six million English language reviews of nine different genres (i.e., fantasy, romance, thriller, horror, mystery, science fiction, historical fiction, contemporary, classics) from the Goodreads website between the spring of 2018 and the spring of 2019. We selected a subset of reviews to annotate from this larger corpus, taking into account text length and reviewer rating. We eliminated reviews with GIFs and used a word list based on our annotation guidelines to select reviews with a high absorption potential (for more information, see ; ). We then manually annotated this subset of reviews with a group of five annotators using our guidelines developed for this project. Our final corpus consists of 493 curated reviews. After annotation and curation, we added the following metadata to our final corpus: title of the book, author of the book, genre of the book (as voted for by Goodreads members), text length of the review in characters and tokens, annotated segment, annotation categories (book-specific mention of absorption versus mention of absorption in general reading behavior; presence or negation of absorption; absorption dimensions; specific absorption categories), off- and onset of the annotated segment, and annotation round. We have added a ReadMe file to the OSF page where researchers can obtain more information about each of the metadata variables included in the dataset.

Due to copyright and data privacy restrictions to the type of data we are working with, we decided to prepare two versions of our corpus: one with metadata, but without the full review texts (this version is available under the OSF-link presented in this paper). The other version does include the full text reviews, which have been fully anonymized. Researchers who are interested in working with this version of the corpus can contact the first author of this paper to obtain access to the complete corpus for research purposes only. We followed the APA guidelines on protected access open data in this decision ().

Developing the annotation guidelines

The annotation guidelines were developed throughout the annotation process which started in March 2019 and was divided into 15 rounds, the last of which was completed in October 2020 (for a thorough description of the annotation process, see ). The 18 statements on the Story World Absorption Scale were taken as the point of departure for the annotation guidelines. Throughout the annotation process, we simplified the language used in these statements to better match the language use of Goodreads reviewers and we added 17 annotation categories to the tagset, partly based on further research (i.e., ) and partly data-driven by what we found in the reviews. To complete the guidelines, we added examples from the reviews for all of the different absorption categories when we could find them. These examples can help other researchers who want to use the tagset to familiarise themselves with the idiosyncratic language found on digital social reading platforms (cf. ; ). Table 1 shows all of the annotation categories and how many times they were used to annotate segments of reviews. The guidelines also contain information about how to assign all aspects of a given annotation category, such as whether a category is present or negated, whether it was an instance of book-specific absorption or a description of how a reader usually experiences absorption, regardless of the book reviewed. Furthermore, the guidelines include comments about the differences between categories that are closely related to one another.

Table 1

Annotation layers and categories with number of annotations per category (presence or negation of the category) for rounds 7 to 14 of the annotation process.


AttentionA1 (Altered sense of time): While reading time moved differently 330

A2 (Concentration): My attention was focused on the book 12102

A3 (General sense of absorption): I was absorbed in the book 1941868

A4 (No distractions): I was not distracted while reading 532

A5 (Forgetting surroundings): While reading I forgot the world around me 20160

A6 (Anticipation): I was on the edge of my seat/I wanted to know what would happen next 1111083

A7 (Inability to stop reading): I did not want to put the book down/I could not put the book down 1501455

Emotional EngagementEE1 (Perspective taking): I could imagine what it must be like to be this character 35350

EE2 (Sympathy): I sympathized with this character 57534

EE3 (Emotional connection): I felt a connection to this character 796910

EE4 (Empathy): I felt how this character was feeling 73721

EE5 (Compassion for story events): I felt for what happened in the story 79790

EE6 (Anger): I felt angry at this character 20191

EE7 (Fear): I felt scared for this character 550

EE8 (Emotional familiarity): I felt like I knew this character 11110

EE9 (Wishful identification): I wish I could be more like this character 880

EE10 (Emotional understanding): I understood why this character did this 31265

EE11 (Parasocial response): I want to have some kind of relationship with this character 79790

EE12 (Participatory response): I wanted to involve myself in the story world events 42420

Mental ImageryMS1 (Imagery of character): I could imagine what the characters looked/smelled/felt/sounded like 18153

MS2 (Imagery of story events): I could see/hear/feel/smell the story events clearly in my mind 20200

MS3 (Imagery of story world): I could imagine what the story world looked/smelled/felt/sounded like 27252

MS4 (Realness): The character/story world felt real to me 73730

TransportationT1 (Presence): While reading this I was in the story world 13130

T2 (Merge of fiction in reality): Elements from the story world came into my world 14140

T3 (Proximity of story world): The story world felt close to me 440

T4 (Deictic shift): I felt transported to the story world 19190

T5 (Part of the story world): I felt part of the story world 34340

T6 (Return deictic shift): I returned from a trip to the story world 330

T7 (Travel in story world): I lost myself in the story world/I traveled with the characters through the story world 26260

ImpactIM1 (Effortless engagement): It was an easy read/I devoured this book 1086840

IM2 (Wish to reread): I will/have reread this book/parts of this book 1161124

IM3 (Anticipation book series): I cannot wait to see how this unfolds in the next book 1701673

IM4 (Addiction): I am addicted to this book/I cannot get enough of this book 91892

IM5 (Lingering story feelings): The book left me feeling …/This book stayed with me for a while 1941922

Note: Items highlighted in dark grey are more succinct phrasings of the original SWAS statements; medium grey items were taken from the absorption inventory by Bálint et al. (); light grey items are additions from the annotation team based on what we found in the reviews.

Note: This table is a reproduction of a table in Kuijpers, Lusetti, Lendvai & Rebora, under review.

(3) Dataset Description

Object name

The AbsORB (Absorption in Online Reviews of Books) Corpus and Annotation Guidelines.

Format names and versions

Corpus: .csv and .xlsx

Guidelines: .pdf

Creation dates

2018-12-01 — 2023-06-15

Dataset creators

Moniek Kuijpers, University of Basel

Simone Rebora, Johannes Gutenberg University Mainz

Piroska Lendvai, Bavarian Academy for Sciences and Humanities

Massimo Lusetti, University of Basel

Lina Ruh, University of Basel

Lukas Renner, University of Basel

Jonathan Tadres, University of Basel

Johanna Vogelsanger, University of Basel

Tina Ternes, University of Basel, and Johannes Gutenberg University Mainz


Data and metadata: English


CC-By Attribution 4.0 International

Repository name

Open Science Framework

Publication date


(4) Reuse Potential

There are many avenues researchers can take in terms of future research. The corpus can be expanded upon with more reviews from different platforms to see whether language use is different from one online community to the next, or with more metadata, such as the number of responses to a review or the number of times a review is read, which would allow for analyses focused on the social aspects of online book reviews. Another expansion could lie in adding reviews with low ratings, to enable analyses on differences in absorption experiences between low-rated and high-rated books. Relatedly, one could look into the annotations per genre and whether readers of different genres mention different absorption dimensions in their reviews. One such study, for which we provided access to the full-text corpus, used network analysis and found preliminary results that point to a strong similarity in vocabulary within romance reviews and within mystery reviews, suggesting that the reading experience of these groups of readers follow a more stable pattern, compared to other genres (i.e., fantasy, science fiction, horror/thriller) where no strong genre clusters were found ().

When it comes to the guidelines, other avenues may be explored. For example, the guidelines could be used to annotate a set of different reviews focusing on specific books or genres or they could be used on reviews from different platforms, or even different types of reader responses, such as open survey questions or interview transcripts. Currently a pilot study is being conducted in which the co-occurrence of absorption and changes in the self-concept in reviews on climate fiction is investigated (). Another avenue that is currently being explored is the translation and development of a German language corpus and guidelines in order to investigate whether certain absorption experiences can be, culturally and linguistically, translated to a different language community ().