How does the automatic document details extraction work?

The automatic extraction of document details (authors, title, journals etc.) from a research paper works in several steps:
  1. The contents of the PDF are analyzed and Mendeley tries to 'guess' which text constitutes the authors, title and other metadata. The accuracy of this step will depend on factors such as the complexity of the article's layout.
  2. Mendeley looks for identifiers such as DOIs and Arxiv IDs in the paper.
  3. Mendeley sends the extracted metadata and any identifiers found to Mendeley Web which in turn queries various online sources, such as Arxiv, PubMed and CrossRef for more accurate data. If better quality metadata can be found online it is used, otherwise the document details extracted from the contents of the PDF are used.

The extraction process is imperfect but we are working to improving the quality of the automatic extraction and the comprehensiveness of the data available on Mendeley Web.

