Why should libraries publish AI-ready collections?

Before the big blow up

About 18 months ago, Jisc established a task and finish group to answer the question, in light of emerging AI, should Jisc create research training datasets for arts and humanities researchers?

It brought together a group of academics from digital humanities and information science and invited librarians to ensure a balanced conversation.

Then ChatGPT happened – the big AI blow up.

Our group suggested we might better focus on getting people to question why they should use AI in the first place. We undertook investigations to identify the best way to address the issues of AI use in research. Many challenges were highlighted by the ChatGPT phenomenon – especially in relation to students’ use of these technologies – but what about arts and humanities researchers?

By focusing on the needs of researchers, we decided that we shouldn’t produce our own datasets (although we could in the future) but should support libraries who want to publish their own collections as datasets to enable research.

AI is here to stay, so libraries need to respond. How better to respond than by doing what they do best, managing their own data and collections? This way libraries can give better access to more machine ready collections as a way of controlling AI outputs.

What is happening to the arts and humanities?

Arts and humanities research is at a juncture: over the last 30 years some researchers have adopted numerical and statistical methods, including AI methods, especially machine learning. These methods are becoming central to research as the world we inhabit becomes increasingly digital.

In digital humanities, statistical and numerical methods are core, but the questions being answered are still humanities questions; humanities and the digital become intertwined. Even those who do not think of themselves as undertaking digital humanities research are affected by a world increasingly focused on data. For researchers to gain the most from these technologies they need to develop skills and methods, and equally importantly, they must be able to access data to undertake research.

If libraries are to provide this data, they need to appreciate the potential uses of the collections they are providing – what kind of research they will be useful for – especially if they want to format a collection as a dataset. To aid this, Jisc wants to support more conversations between those providing and those consuming collections.

Collections as Data

The task and finish group’s research explored outcomes of the US-originated concept. We found that this, and its attendant Vancouver Statement on Collections as Data, has not really taken hold in the UK yet. Our interviews with senior university librarians have confirmed this. A recent series of podcasts with academics and librarians has also pointed to the critical importance of making more machine ready collections available and highlighted the library’s role in providing them.

Join our webinar

On 22 November 2023, Ines Byrne of the National Library of Scotland and Jodie Double of the University of Leeds will join me for an interactive webinar to encourage the library community to be bold in strategic digitisation choices to support research and researchers with collections and to feel more confident with the impending introduction of AI into the community.

We will explore why the National Library of Scotland built its Data Foundry, including processes, challenges, and opportunities, and hope to give universities a boost to get collections out there to meet the needs of both researchers and machines. We will also explore the risks and potential benefits of university libraries releasing machine-ready collections.

We hope you can join us to debate some of these issues, or simply to learn more about making your collections available. Equally, we would like you to come along if you are interested in using collections in your own research. You can register here.

You can read more about our research into these issues in this series of blog posts starting with the report, Is AI for Me? Or listen to the series of podcasts on the same theme, but from a researcher perspective.

Why should libraries publish AI-ready collections?

Before the big blow up

What is happening to the arts and humanities?

Collections as Data

Join our webinar

About the author