What is the Lens

The Lens is an open global cyberinfrastructure to make the innovation system more efficient and fair, more transparent and inclusive.

The Lens is building an open, an open platform for Innovation Cartography. Specifically, the Lens serves nearly all of the patent documents in the world as open, annotatable digital public goods that are integrated with scholarly and technical literature along with regulatory and business data. The Lens will allow document collections, aggregations, and analyses to be shared, annotated, and embedded to forge open mapping of the world of knowledge-directed innovation. Ultimately, this will restore the role of the patent system as a teaching resource to inspire and inform entrepreneurs, citizens and policy makers.

Within the next two years, we expect to host over 95% of the world's patent information and link to most of the scholarly literature, creating open public innovation portfolios of individuals and institutions. Using all open source components, we are working to create open schemas by which patent documents can be used to teach and communicate, rather than confuse and intimidate.

Underlying data and analytics will be available to the public with APIs. By creating and freely sharing APIs and by building modular, standardized specifications, we can envision growing public use of innovation cartography to decrease fear, uncertainty and doubt hindering investment and enthusiasm.

Patent datasets

  • The European Patent Office’s DocDB bibliographic data from 1907 - present: 81+ Million documents from nearly 100 jurisdictions.
  • USPTO Applications from 2001 – present with full text and images.
  • USPTO Grants from 1976 – present with full text and images.
  • USPTO Assignments (14+ Million).
  • European Patent Office (EP) Grants from 1980 – present with full text and images.
  • WIPO PCT Applications from 1978 – present with full text and images.
  • Australian Patent Full Text from IP Australia

Scholarly datasets

  • Scholarly records from PubMed (27,860,556)
  • Scholarly records from Crossref (17,889.848)
  • Scholarly records from PubMed Central (4,155,640)

Here is a brief list of the metadata available in the scholarly records
- citation identifiers
- title
- publication date
- publication type
- authors (first and last name, order, affiliation)
- start end pages, volume, issue
- journal
- abstract
- references (string with identifiers if available)
- funding/grant information
- keywords (PubMed only)
- mesh_term (PubMed only)
- chemicals (PubMed only)
- clinical_trial data (PubMed only)
- citing patents


All of the software that comprises the Lens application itself is Open Source and free to use. The following is a brief overview of the main technologies and frameworks that make up the Lens:

  • Lens servers run within the Amazon EC2 cloud-computing platform
  • Images are stored and served using Amazon S3
  • Apache HTTP Servers are used for proxies and load balancing
  • The UI and data services are served by Apache Tomcat
  • Apache Lucene is used for indexing and searching text