The University of Edinburgh, DataShare, DOI: 10.7488/ds/3009.
The data for the Lothian Diary Project are part of Edinburgh Speaks, a project of the Language Variation and Change Research Group in Linguistics and English Language at the University of Edinburgh. The dataset comprises 125+ individual contributions with varying levels of consent for sharing. Each contribution consists of a video or audio recording of 1–22 minutes and most also include answers to a 20-minute survey on participant demographics and COVID-19 experience. The video/audio recordings are mostly vlogs of individual adults or children speaking, responding to project website prompts (e.g. “How has your life changed during lockdown?”), but some are interviews with a parent, volunteer, or social worker behind the camera asking similar questions. These data are useful for both qualitative and quantitative social science research interested in any area of COVID-19 experience at the individual level. The focus is on Edinburgh and the Lothian countries because of the project’s original motivation to conduct dialectological and sociolinguistic research on the local community. A subset of the data has been used for an MSc dissertation on how different notions of audience influence messaging and presentation style (Lee, 2020).
(1) Participants self-record a digital audio/video diary. (2) Participants give consent for data usage via the opening page to the survey on Qualtrics (level 1, other researchers; level 2, oral history archive; level 3, radio/television/online) and are given the option to waive anonymity. (3) Participants upload the diary file to a secure, temporary repository on Box.com only accessible by the research team. The file is manually transferred by a Research Assistant to a protected server space for processing (‘DataStore’). (4) Participants answer questions about their COVID-19 experience through a Qualtrics survey, adapted from questions created by the University of Edinburgh’s CovidLife project.1 (5) Participants chose an option for a £15 payment: bank transfer, local business voucher, or charity donation. All diaries are backed up in original formats and as WAV files (mono, 16 KHz and mono/stereo 44 kHz).
Data processing steps: Transcripts of English language recordings are generated using custom automatic speech recognition models implemented in Kaldi (Povey et al, 2011) and then hand-corrected in ELAN (ELAN, 2020). Transcripts of non-English speech will be hand-transcribed at a future date. For contributions marked for sharing with other researchers, mono 16 KHz WAV files, MP4 files, and accompanying transcripts and surveys are manually deposited into the Lothian Diary Project repository on DataVault; one deposit is available now and another will be made available upon completion of data collection. Data collection began on the 25th of May 2020 and will continue until the 31st of July 2021.
Stage 1 of data collection followed convenience sampling. Any resident of Edinburgh and the Lothian counties was welcome to participate, regardless of vulnerability (e.g., children, vulnerable adults). Participants were recruited via social media platforms, adverts in local newspapers and radio, press releases, and word-of-mouth. Stage 2 of data collection was introduced to recruit underrepresented participants. The sampling strategy at this stage was to contact and partner with charities representing homeless, disabled, or other vulnerable individuals, as well as caregivers of any group. We established charity partnerships by running a targeted social media advertising campaign and an online interactive workshop for the Economic and Social Research Council’s ‘Festival of Social Science’. We also rented a local community space for three days to allow digitally excluded members of the public (e.g., those without reliable Wi-Fi access) to participate in person. At the time of writing, most of the participants are from Edinburgh City, and twice as many are female as are male. Most have an undergraduate degree or technical qualification, and most are between the ages of 16–64. Relative to the Edinburgh population, the sample has a greater representation of participants who are of colour (20%) and born outside the UK (27%). 15% are disabled and over 20% are LGBTQA+.
The survey questions, many of which had already been piloted by the CovidLife research team, went through two rounds of piloting with this project, including input from archivists at Museums and Galleries, Edinburgh, prior to public release. Based on the piloting, we created a website with a FAQ page addressing practical advice and ethical advice (e.g., descriptions of data use, risk, an example consent form).2 The survey was also revised several times to ensure clarity and ease of use with regard to the consent process, the Box.com file upload, and the survey questions.
(3) Dataset Description
The Lothian Lockdown Diary Corpus
Format names and versions
MP4, MP3, M4A, MOV, WAV, CSV, TXT
2020-05-25 to 2021-07-31 to match prior text (and the facts)
Lauren Hall-Lew: Conceptualization, Funding acquisition, Methodology, Project administration, Supervision, Writing – original draft
Claire Cowie: Conceptualization, Funding acquisition, Methodology, Project administration, Supervision, Writing – review & editing
Stephen McNulty: Data curation, Investigation, Methodology, Writing – review & editing
Nina Markl: Data curation, Investigation, Software, Visualization, Writing – review & editing
Shan-Jan Sarah Liu: Conceptualization, Funding acquisition, Supervision, Writing – review & editing
Catherine Lai: Data curation, Resources, Software, Supervision, Writing – review & editing
Clare Llewellyn: Investigation, Writing – review & editing
Beatrice Alex: Writing – review & editing
Nini Fang: Writing – review & editing
Zuzana Elliott: Methodology, Data curation
Anita Klingler: Data curation
English, Scots, Scottish Gaelic, Mandarin Chinese, Cantonese, and British Sign Language. Scottish Gaelic contributions have been translated into English. All non-English languages will be transcribed faithfully prior to translation into English. Many recordings include local place names and local terminology, often in Scots.
The sharable portions of the Lothian Diary Project are and will be deposited under the Creative Commons open license (CC0).
All data marked for reuse by other researchers (as per participant consent) are placed in the Lothian Diary Project collection either housed in the DataShare repository or the DataVault repository at the University of Edinburgh. The DataShare repository is Open Access and includes the audio or video recordings and their corresponding transcripts. The DataVault repository contains the same as well as participants’ survey answers, including demographic information. These more sensitive data are accessible by contacting the data manager, Catherine Lai, at C.Lai@ed.ac.uk. For data collected at the time of writing, the DOI for DataShare is 10.7488/ds/3009 and the DOI for DataVault is 10.7488/7a22cc4b-87ec-4df3-a549-3b347fd4bca5.
Data collection for the Lothian Diary Project will continue until approximately 31 July 2021, with publication to follow soon thereafter.
(4) Reuse Potential
Data that have been marked for reuse, as per the participant’s consent, will be housed as an oral history archive with Museums and Galleries, Edinburgh, allowing researchers, policymakers, civil organisations, and community members to access. Those marked for wider reuse are available for other purposes (e.g., aggregation, reference, validation, teaching) to any interested individual or organisation, with the permission of the Lothian Diary Project research team. Our data can be reused beyond the scope of our project for other researchers, as well as non-researchers to identify common themes, topics and issues discussed in the pandemic. For example, the diaries capture facial, gestural, and speech components of affectively charged topics, enabling individuals who are interested in multimodal expression to further explore this area. Policymakers and civil organisations may also access our data to conduct further research to advance future responses to health emergencies, as well as to provide support for vulnerable groups. We also welcome collaborative opportunities, especially for creative ways in which we could share and reuse our data beyond the current scope of our project.