We have mined DocDB master database containing the global patent literature for citations of scholarly published work, extracted, and cleaned more than 30M of non-patent literature (NPL) citation strings . These were then sent to Crossref work’s API and PubMed Hydra system for matching with DOIs and PubMed Id (PMID), respectively.
To reduce the number of false-positively matched PMID and DOI identifiers, the identifiers had to pass a multi-staged quality assurance pipeline. For an identifier to be accepted, the corresponding metadata had to match up with the NPL string on essential bibliographic fields which included author surnames, publication year, title fuzzy matched, and journal. As an additional assessment factor, the DOI-to-PMID linkage data (provided by PubMed) was interrogated. PMIDs and DOIs, which were independently resolved by the APIs, were given a higher confidence score when they were cross-referenced in the linkage information. If there was no linkage information available, a pairwise metadata similarity score was calculated using essential bibliographic fields. Any remaining citation identifiers with an API-provided match score greater than 100 for Crossref and 0.8 for PubMed-Hydra, were still accepted.
Finally, all accepted identifiers and corresponding NPL strings were clustered based on the transitive closure. Any cluster that contained citation identifiers with conflicting bibliographic metadata (on the volume, issue, journal, and pages fields) were split into separate clusters. The final clustering was then used to form unique citation entities for the subsequent analyses.
With this stringent NPL processing protocol, we released PatCite to enable users to find out whether the science they do, or enable, or fund is enabling real outcomes for society, through new inventions and products. The tool is open, free, and secure. You can track, filter sort and link scholarly articles cited in patents, examine the citing patents and discover new partners and collaborators, to build and visualise a holistic innovative landscape and assist you in your decision-making.
In the landing page, you are provided with two options:
- Explore the cited scholarly work found in patent literature (How? Upload a text file of article PMIDs or DOIs). Here is an example of text file that you can simply copy and paste in the box, label your search and submit to explore the app RAJlist20170728
- Explore the patents that cite scholarly articles. Here you have a couple of options: you can either enter the patent publication numbers or Lens ID or past a text file containing these or you can link to PatCite from your work area using your own patent collections. Example: Try this published and public collection of Richard Jefferson: https://www.lens.org/lens/collection/22967 . After viewing it in your work area you may link to PatCite as in the screenshot below.
The dynamic maps show which research article, which scientists or researchers, and potentially, which institutions have influence over a subset of economic activity. on PatCite landing page, we provide different scenarios on how the tool can be used by individual researchers/inventors, university departments, institutions, or even funding organizations.