Skip to Main Content

Research Impact Metrics: Citation Analysis

Text/Data Mining for Citation Indexes

As described throughout this guide, in many cases you may be able to retrieve the information you need by querying the user interface of a citation index. However, sometimes in order to investigate your research question, you may want to retrieve all (or some particular subset) of the data in the citation index in order to perform your own analysis. While not all indexes support this use case, many of the most important ones do.

Many indexes allow this by way of an application programming interface (API) that allows uses to query the database and retrieve data programmatically. In other cases, the organization that produces the index can supply customized datasets upon request from researchers. And in other cases, the University of Michigan Library has obtained the data from the index for use by researchers affiliated with our university. 

This page outlines how you can go about retrieving content for text/data mining purposes from the major citation indexes that we subscribe to. 

If you have questions about this process, or need support exploring text and data mining, you may contact the library's digital scholarship support team (library-ds@umich.edu).

 

Major Citation Indexes and API access

Dimensions Plus (Digital Science) brings together various research-related data sources in a venue that is consistent and accessible to the community. Data include grant information and also information on several publication types, including Books, Journal Articles, Conference Proceedings, and Patents. Dimensions Plus provides the community with a data discovery engine that offers both context and perspective.

  • Dimensions makes the the Dimensions Metrics API publicly available to anyone who wishes to use it. 

  • To request access to the Dimensions Metrics API, complete this form.

  • Documentation about how to access and use the Dimensions Metrics API 

Google Scholar is a freely accessible web search engine that indexes the full text or metadata of scholarly literature across an array of publishing formats and disciplines.

  • Google Scholar does not currently offer an API for automated extraction of data. Some scholars have scraped data from Google Scholar for research purposes, but it is a tedious and onerous task.

Psycinfo (APA) is the premier resource for surveying the literature of psychology and adjunct fields. Covers 1887-present. Produced by the APA.

  • PsycInfo provides custom data sets upon request to users at subscribing institutions. 
  • Read more about this service and how to make a request

PubMedCentral (NLM) is a free full-text archive of biomedical and life sciences journal literature at the U.S. National Library of Medicine.

  • PMC makes its APIs freely available for anyone who wishes to use them.
  • Read about the tools available and how to access them
  • Find and access the API that suits your needs

​Scopus (Elsevier) is an international multi-disciplinary indexing & abstracting database for scientific, medical, technical, and social sciences.

  • Scopus provides several different APIs for use by researchers at subscribing institutions
  • Learn more about the Scopus APIs and request an API key to access them

Web of Science (Clarivate) provides a Core Collection of multidisciplinary indexes which permit searching for articles that cite a known author or