Software stack

COSMOS uses state-of-the-art AI technology to enable automated and customizable knowledge extraction over diverse scientific domains. Its software pipeline takes as input model code or user queries, parses those into entities, and then uses a suite of microservices to locate and extract related information from text, tables, and figures. The output of COSMOS includes granular, document-level question answering capabilities and aggregation of data and information across large collections of documents.


  • PDF segmentation and element extraction
  • Equation and Model extraction
  • Open-domain search, retrieval and question answering


COSMOS addresses the significant challenges posed by ingesting complex source documents. The pipeline is tolerant of heterogeneous document formats and handles noise, uncertainty, and conflicting information. The system is designed to reason about disparate representations of information in a unified manner, providing a summary overlay that is usable by scientists and decision makers. COSMOS addresses these challenges by employing novel AI models and adopting a user-focused, service-oriented architecture. Deployment over a diverse and large collection of publications makes UW-COSMOS actionable for many different domains of inquiry.

License and Use Information

Software produced under the ASKE DARPA program are released under an Apache License, Version 2.0 and distributed on GitHub.