The Computational Biosciences Group develops innovative software solutions and methods for FAIR Data Management and Data Analysis and Machine Learning for experimental data and mechanistic models toward addressing our nation’s energy, environment, and health needs.
Hierarchical Data Modeling Framework (HDMF) is a python package for working with hierarchical data and creating extensible data standards [Source] [Contact] [Cite]. Additional tools in the HDMF ecosystem include the following:
- HDMF Common Schema is a collection of reusable data schema for creating scientific data standards. [Source] [Contact] [Cite]
- HDMF ML Schema is a format schema for common machine learning workflows and outputs. [Source] [Contact]
- HDMF-Zarr implements a Zarr backend for HDMF. [Source] [Contact] [Cite]
- HDMF DocUtils is a library for generating documentation from HDMF data schema. [Source] [Contact]
Linked Open Data Modeling Language (LinkML) is a flexible data modeling language and software framework for working with and validating data in a variety of formats (JSON, RDF, TSV) and for compiling LinkML schemas to other frameworks [Source] [Contact]. Additional tools in the LinkML ecosystem include the following:
- linkml-model defines the metamodel schema and specification for the LinkML modeling language. [Source]
- linkml-runtime is a python library providing runtime support for LinkML data models. [Source]
- schema-automator is a toolkit that assists with the generation and enhancement of LinkML schemas. [Source]
- schemasheets is a framework for managing schema using spreadsheets and compiling them to LinkML. [Source]
- linkml-project-cookiecutter is a Cookiecutter template for projects using Linkml. [Source]
Domain Data Standards and Portals
Biolink Model provides a high-level data model of biological entities (genes, diseases, phenotypes, pathways, individuals, substances, etc.), their properties and relationships, and enumerates ways in which they can be associated. [Source] [Contact]
- KGX is a toolkit and file format for working with and for exchanging data in Knowledge Graphs (KGs) that conform to or are aligned to the Biolink Model. [Source] [Contact]
Neurodata Without Borders (NWB) is a R&D100 award-winning, leading data standard for neurophysiology supported by the NIH BRAIN Initiative [Source] [Contact] [Cite]. NWB provides neuroscientists with a common standard to share, archive, use, and build common analysis tools for neurophysiology data. NWB is supported by many neurophysiology tools and a growing number of neurophysiology data are available in NWB via the DANDI data archive. NWB also includes a broad range of core software for using NWB data, among others:
- PyNWB is the reference Python API for working with NWB files. [Source]
- MatNWB is the reference Matlab API for working with NWB files. [Source]
- NWBInspector is a tool for inspecting NWB files for compliance with best practices. [Source]
- NWBWidgets a library of widgets for visualization NWB data in a Jupyter notebooks. [Source]
- NDX Template is a Cookiecutter template for creating Neurodata Extensions (NDX) for NWB. [Source]
- Staged Extensions is a repository for submitting Neurodata Extensions (NDX) to the NDX Catalog. [Source]
BASTet is a novel framework for shareable and reproducible data analysis that supports standardized data and analysis interfaces, integrated data storage, data provenance, workflow management, and a broad set of integrated tools. [Source][Contact][Cite]
ClearMap is a toolbox for the analysis and registration of volumetric images of organs and organisms obtained via tissue clearing, immunolabeling and light sheet microscopy (iDISCO) [Source] [Contact] [Cite]. ClearMap's toolbox includes the following components:
- Wobbly-Stitcher for stitching TB data sets non-rigidly.
- TubeMap for extracting vasculature and other tubular networks from TB data.
- CellMap for extracting neuronal activity markers and cell shapes.
Dynamic Components Analysis (DCA) is an unsupervised dimensionality reduction algorithm that finds low-dimensional subspaces with high dynamical complexity(Predictive Information). [Source] [Contact] [Cite]
Supervised Dynamic Components Analysis (sDCA) is an extension of DCA for unsupervised dimensionality reduction algorithm that finds low-dimensional subspaces between a source and a target time-series (e.g., interacting brain regions, brain-to-behavior) with highest Predictive Information.[Source] [Contact]
Compressed Predictive Information Coding (CPIC) is a generalization of DCA to non-Gaussian, non-linear mappings. It compresses both the past and the future of the time series based on a Predictive Information bottleneck. It uses Bayesian variational inference and deep learning. [Source] [Contact]
Orthogonal stochastic linear mixing model (OSLMM) is an unsupervised learning algorithm for time series data that imposes an orthogonality constraint on the latent mixing terms. In practice, this results in more interpretable latent spaces. [Source] [Contact]
pyUoI is a Python package implementing several statistical-machine learning algorithms in the Union of Intersections framework, which infers models with accurate feature selection (low false positives and low false negatives) and estimation (low bias and low variance). [Source] [Contact] [Cite]
MotifDetector targets understanding motifs and their role in communication in interacting sub-processes (e.g. two interacting animals, two interacting brain regions), e.g., via MCMC inference of infinite Hidden Markov models. [Source] [Contact]