Article processing charges (APCs) are fees which are sometime paid in order to make a research article, or other research output, open access. In the UK many institutions now spend significant amounts on APCs, particularly in light of the policies of RCUK and the Wellcome Trust which make specific funds available to pay APCs [1, 2]. The fact that institutions now pay fees for both subscription journals and APCs to the same publishers is a cause for concern for both institutions and policy makers . Jisc Collections, as the body which negotiates journal subscription deals on behalf of UK academic libraries, now includes both subscription and APC expenditure in its negotiations. The data presented here was gathered in order to inform these negotiations.
Following the sampling strategy outlined below, all institutions were contacted by email and asked to provide data. Some institutions used Jisc’s recommended template spreadsheet . For those that did not, their data was converted to fit this template by mapping corresponding fields. Files for individual institutions are also available . Data was then compiled into a single CSV file with an additional column named ‘Institution’ to identify the source and normalized as outlined below.
In 2014, 23 institutions had provided APC data to Jisc Collections covering payments made in 2013. The same institutions were asked to repeat the exercise in 2015 for payments made in 2014. 22 of those institutions (listed under ‘Dataset creators’) agreed to participate and release their data openly. One institution agreed to provide data but not to release it openly. Three further institutions (Plymouth University, Durham University, and Loughborough University) were also asked to provide data in order to increase the size of the dataset. They were chosen because they were in receipt of an RCUK block grant but are in Jisc bands 4 and 5 , which were underrepresented in the original sample.
Data was normalized by standardizing the data for easy comprehension. For example, converting all date formatting to DD/MM/YYYY where known; standardizing abbreviations and punctuation; and using only one variation of a publisher or journal name e.g. PLOS rather than Public Library of Science. The number of entries in the fields ‘ISSN’ and ‘Type of publication’ was increased by copying data across, e.g. if one entry for PLOS Genetics had the ISSN listed as 1553–7390 then the rest would also.
(3) Dataset description
Format names and versions
2015-01-20 to 2015-03-24.
Stuart Lawson, Research Analyst, Jisc Collections – Designed the data collection and template, collected data, aggregated and cleaned data, authored the paper.
Bangor University, University of Bath, University of Birmingham, University of Bristol, University of Cambridge, Cranfield University, Durham University, University of Glasgow, Imperial College London, Lancaster University, University of Leicester, University of Liverpool, Loughborough University, LSHTM, Newcastle University, Plymouth University, University of Portsmouth, Queen Mary University of London, Royal Holloway University of London, University of Salford, University of Sheffield, University of Sussex, Swansea University, UCL, University of Warwick.
(4) Reuse potential
This data could be reused by combining it with APC expenditure data from other sources. The author is not aware of any comparable data being available from outside the EU. While some data is available from German institutions , the UK currently has the largest quantity available with over 40 UK higher education institutions, along with the Wellcome Trust, having now published at least some level of detail about their APC expenditure during 2013 and/or 2014 . If this dataset is combined with others there may well be duplicate entries so this would need to be taken into consideration. Over time, it can be used as a benchmark against which to evaluate APC expenditure in future years as that data becomes available.
Information about the level of expenditure on journal subscriptions with some publishers is also available for the UK, so this can be combined to see the total subscription and APC expenditure levels that some institutions have with some publishers . The level of APC expenditure compared to subscription expenditure is growing – up to 30% in some cases  – so this is an important area of continuing research if research funders and institutions are to monitor where their funds are going. Jisc Collections will repeat the data collection exercise each year for at least the next three years.
It is likely that extensive validation work on the dataset, such as checking that the information contained is correct, would lead to a number of alterations and corrections. For many of the fields it would only be possible for the institution or research funder to validate, but the bibliographic information could be made more accurate by using data from other sources. Due to time restraints, data was only normalized to be internally consistent and was not verified by checking primary sources. Therefore cross-checking it with sources such as CrossRef would lead to greater accuracy of the bibliographic fields.
Further analysis could reveal information about the extent of payments made to particular publishers and the average APC price paid to different publishers. It would also be possible to highlight relationships between individual research funders and publishers, by seeing which publishers receive money from any given funder.
SL was in paid employment by Jisc Collections as part of the data acquisition for this study.