Kadi4Mat — CUCH

Kadi4Mat
Welcome to the website of
Kadi4Mat
, the generic and open source
virtual research
environment
Demo instance
Documentation
About
Kadi4Mat, or Kadi for short, is a generic and open source virtual research environment. Originally
developed in the context of materials science, Kadi4Mat can be used for the management of any type
of research data within different research disciplines and use cases. Its goal is to combine the
ability to manage and exchange data, the
repository
component, with the possibility to
analyse, visualise and transform said data, the
electronic lab notebook
ELN
component. The focus on the repository component is on
warm
data, i.e. unpublished data
that is yet to be analysed further, while in the ELN component it is on the automated and documented
execution of heterogeneous workflows via an application programming interface (API). In this way, a
customizable framework is created that facilitates good research data management (RDM) practices and
collaboration between researchers.
Further information:
Kadi4Mat at
RDM@KIT
Kadi4Mat in the
Helmholtz Research Software Directory
Kadi4Mat in the
ELN Finder
Use of Kadi4Mat at the
Division MFM (TU Darmstadt)
The Cluster of Excellence for Post-Lithium Storage
POLiS
employs Kadi4Mat as integral RDM platform. In that context, guides were created that give an
overview
of RDM, describe the
usage of Kadi4Mat for FAIR RDM
sketch the
theory and methods
of RDM, and answer
frequently asked questions
Finally, a series of tutorial videos was produced, which shows how to use the basic functionality of
Kadi4Mat for building up a flexible and customisable research data management solution. Furthermore,
it shows how to publish data via Zenodo, a public, open source research data repository.
To video series
This video was produced in the context of the
FestBatt
project and shows an overview of the cluster-wide research data management as well as corresponding
data science applications.
ML workflow for virtual characterization of solid electrolyte interphases
A reproducible machine learning workflow is a systematic process that enables the
efficient development and implementation of machine learning models, ensuring consistent
outcomes for all users. Key components include effective version control, dependency
management, data handling, and thorough documentation. This approach enhances
collaboration and scalability across various environments. In the Kadi ecosystem, tools
like KadiStudio facilitate data organization, processing, and analysis, while RDM-assisted
workflows create informative knowledge graphs that track data provenance and relationships
between records. We demonstrate this reproducible workflow in a study focused on the
virtual characterization of solid electrolyte interphases (SEI), which are crucial for
battery performance and safety. The complexity of capturing the evolving SEI's
physiochemical properties through traditional modeling approaches has led to the need for
a data-driven virtual material design approach. Here, we present a machine learning
paradigm for virtual material analysis and synthesis that learns how to represent,
characterize, and generate SEI configurations with physical and stochastic attributes
derived from Kinetic Monte Carlo simulation. A Variational AutoEncoder with a property
regressor (prVAE) learns descriptive data-driven properties to represent the SEI
configuration with respect to target physical properties. The features of the well-known
SEI configuration are investigated at the bottleneck of the VAE so one can see how the
observable SEI characteristics, like thickness, porosity, density, and volume fraction,
affect the learned data-driven characteristics.
More information can be found in our publications:
DOI: 10.1002/aenm.202301985
and
DOI: 10.11588/heibooks.1288.c18061
ML assisted design of experiments for solid state electrolyte
Machine Learning (ML) has become a powerful tool in experimental design, especially in the
optimization of complex material synthesis processes. One popular approach is Bayesian
optimization, which is used to efficiently explore and optimize experimental parameters.
Bayesian optimization is particularly useful in scenarios where experiments are costly or
time-consuming, as it strategically selects the most promising conditions to test next
based on prior data. By continuously updating its model with new experimental results,
Bayesian optimization helps identify optimal conditions with fewer trials, accelerating
the discovery of materials with desired properties.
Batteries with solid electrolytes are safer, offer higher energy density, and last longer,
making them a promising alternative to traditional liquid electrolyte batteries. To
optimize its performance, we need to systematically study how different experimental
factors affect its design. In this study, we used ML to guide the design of experimental
conditions for synthesizing one solid-state electrolyte material and used Kadi4Mat to
efficiently manage the research data during the experiments. We focused on factors like
variations in precursor concentrations, sintering temperature, and holding time. We
compared different ML models built from previous lab data to find the best one for
designing new experiments. This approach led to the discovery of a good sample with
competing performance after just a few iterations. To understand why these samples have
high ionic conductivity, we analyzed their phase compositions and crystal structures using
X-ray diffraction and examined their microstructures with scanning electron microscopy.
Our results show that applying machine learning to design experimental conditions can help
researchers create desired materials more efficiently, reducing the number of experiments
needed.
More information can be found in our publication:
DOI: 10.3389/fmats.2022.821817
LLM-based chatbot
The rapid advancement of battery technologies has generated a vast amount of research data
and scientific literature. Manually analyzing this wealth of information is a daunting and
time-consuming task, especially when trying to keep up with the latest findings. Natural
Language Processing (NLP) techniques, particularly those powered by Large Language Models
(LLMs), offer a powerful solution to this challenge. LLMs can process and understand large
amounts of text, allowing researchers to efficiently extract key insights from complex
battery research data and publications. These models can summarize long articles, identify
relevant studies, and even extract specific data points or trends, helping researchers
quickly gather and synthesize the information they need to drive innovation.
By applying NLP techniques to battery research, scientists can automate the extraction of
valuable information from diverse sources, including academic papers, patents, and
technical reports. NLP can help identify correlations between different research findings,
identify gaps in the literature, and uncover emerging trends. This not only speeds up the
research process, but also provides a more comprehensive understanding of the field. As
battery technologies continue to evolve, the integration of NLP tools into the research
workflow will become increasingly important, enabling researchers to navigate the
ever-growing body of knowledge and make more informed decisions in the development of
next-generation energy storage solutions.
More information can be found in our LLM-based chatbot demo:
Generating digital twins of microstructures using ML to accelerate battery simulation studies
Deep Learning (DL) is revolutionizing the field of materials science by accelerating
simulations and enabling the generation of realistic microstructures. By using DL models,
researchers can significantly speed up the process of simulating complex material
behavior, reducing the time and computational resources required. In addition, generative
Artificial Intelligence (AI) models, such as those used to create realistic images, can
now be applied to generate microstructures that closely resemble real materials. These
advances make it possible to more efficiently explore material properties, optimize
designs, and predict performance without relying solely on traditional experimental
methods.
Porous structures are commonly found in battery materials. A thorough study of the
properties of these structures is important to analyze their corresponding influence on
the battery. A promising approach is to generate digital twins of real structures,
validate them with real experiments, and then use data-driven methods such as artificial
neural networks to understand the relationship between the structure, the manufacturing
process, and the material properties. In this study, we use a Variational AutoEncoder
(VAE) neural network to capture and simplify the 3D structure of porous materials. This
allows us to model the relationship between the structure and its properties, and also to
predict how changes in the manufacturing process might affect the material. Our method
provides a reliable way to learn about these structures without the need for labeled data,
enabling a deeper understanding of porous materials.
More information can be found in our publication:
DOI: 10.1016/j.actamat.2023.118922
Enhancing analysis of spectral data with ML models
In materials science, the characterization of complex materials often generates large and
complex data sets that can be difficult to analyze using traditional methods. Machine
Learning (ML) has emerged as a powerful tool for efficiently handling and interpreting
such data. Using advanced algorithms, ML can identify patterns, make predictions, and
extract meaningful insights from large and complex datasets that would otherwise require
extensive manual analysis. This approach not only speeds up the research process, but also
increases the accuracy and depth of analysis, allowing researchers to uncover subtle
relationships and characteristics within the data. In this context, ML offers significant
potential to transform materials research by simplifying the interpretation of complicated
characterization data and opening up new avenues for discovery and innovation.
A technique called Time-of-Flight Secondary Ion Mass Spectrometry (ToF-SIMS) can be used
to analyze surfaces of materials, but the data it generates is often complex and
time-consuming to interpret manually. In this study, ML method, particularly logistic
regression, is employed to classify different material components and identify key ions
from different compounds. These models are then used to determine the compositions of both
mixtures and real samples from their ToF-SIMS data. The ML approach successfully
identifies the characteristic ions and accurately predicts the compositions of new
samples. The study also discusses the practical benefits and limitations of this method,
highlighting its potential to simplify the analysis of other materials.
More information can be found in our publication:
DOI: 10.1021/acsami.3c09643
The use of RDM infrastructure helps to structure and link data in research projects in
consideration of the FAIR data principles. However, the development of artificial
intelligence (AI) models as part of a project usually occurs independent of the used RDM
software. Unless the AI model developer documents the used methods and uploads them with
the results to the RDM database, the re-usability of the data by other researchers is
hindered and the FAIR principles are harmed (see Figure 1a). This use-case presents the
implementation of the FAIR data principles in the development process of a AI model
using Kadi4Mat in combination with KadiAI. The aim was to record the connections between
raw data, model architecture, hyperparameters, and results to enable future comparative
and transfer studies. The suggested new AI project development process is shown in
Figure 1b. For the automatic linking of components the KadiAI ontology shown in Figure 2
was developed and applied.
Figure 1
Figure 2
The use case involves the evaluation of dynamic contrast-enhanced magnetic resonance
imaging (DCE-MRI) data from 75 subjects from the COSYCONET study, which contains around
600 patients with chronic obstructive pulmonary disease (COPD) of different disease
severity. For the successful analysis of this data breathing motion as shown in the figure
below has to be identified and removed. Since manual breathing detection is cumbersome, an
AI model for automatic breathing detection was developed by training and testing three
different AI hypermodels.
The automatic synchronization of the AI project with Kadi4Mat results in records with
metadata and files for each component (see Figure 3 and Figure 4). Due to the usage of
the KadiAI ontology, the links between the single records lead to the automatic
generation of a knowledge graph for the whole AI project (see Figure 5). This use-case
shows the successful implementation of a concept that automates RDM for AI applications,
minimizing the overhead for the individual researcher and recording rich metadata and
links between entities toward FAIR data.
Figure 3
Figure 4
Figure 5
Researchers use different ELNs and similar research data management systems. To ensure
interoperability, a common file format is required, for which RO-Crates are suitable.
RO-Crate is a container format for packaging research data and metadata. It consists of a
ZIP archive, used to bundle all the data, and a JSON-LD (JSON for Linked Data) file to
describe the archive's structure and contents. Kadi4Mat facilitates the export and import of
RO-Crates, enabling interoperability between different systems.
The Resource Description Framework (RDF) is a standard for describing metadata in a
graph-like format and serves as the foundation for many ontology modeling languages. It
describes the connections between entities as machine-readable triples (subject, predicate
and object) as shown in the figure below. Kadi4Mat facilitates the export of metadata in
RDF, further improving metadata interoperability.
The Shapes Constraint Language (SHACL) is a W3C standard for describing and validating RDF
graphs. Kadi4Mat templates support the export of SHACL, facilitating the creation of
shareable and reusable metadata schemas.
The following image shows an ontology that was created using the Protégé software in the
context of battery research. While Kadi4Mat does not yet support the use ontologies out of
the box, its HTTP API faciliates use-cases such as instantiating ontologies. This can be
achieved by running a script to process the corresponding, serialised ontology.
The following image shows such an instantiated ontology, showing the created records, as
grouped within a single collection, and their links. These kinds of graphs can be generated
and visualised automatically within the graphical user interface of Kadi4Mat.
Kadi4Mat is being developed at the
Karlsruhe Institute of Technology (KIT)
as part of several research projects, including:
FestBatt
the
Cluster of Competence for Solid-state Batteries
funded by the German Federal Ministry of Education and Research (BMBF) under grant number 03XP0435D.
POLiS
the
Post Lithium Storage Cluster of Excellence
funded by the German Research Foundation (DFG) under project number 390874152.
AQua
the
Battery Competence Cluster Analytics/Quality Assurance
funded by the German Federal Ministry of Education and Research (BMBF) under grant number 03XP0315B.
MoMaF
the
Science Data Center for Molecular Materials Research
funded by the Ministry of Science, Research and the Arts Baden-Württemberg (MWK-BW) under grant number
34-7547.222.
NFDI4ING
the
National Research Data Infrastructure for Engineering Sciences
funded by the German Research Foundation (DFG) under project number 442146713.
Ecosystem
KadiAPY
Use Kadi4Mat's HTTP API more easily with the
KadiAPY
library. It offers
high-level interaction with the API in
Python
as well as a
Command Line
Interface
, as shown below. The library supports both Linux and Windows.
Documentation
Source code
KadiAI
Integrate and implement your
Artificial Intelligence
AI
) and
Machine
Learning
ML
) algorithms with
KadiAI
. Leverage interactive
dashboards to design, train, and tune data-driven models or enhance your custom AI scripts with
next-level research data management.
KadiStudio
Design and execute your
scientific workflows
with
KadiStudio
, a flexible
workflow editor. Use a wide range of existing or customised tools to create reproducible research.
Source code
KadiFS
Access and edit your data directly with the filesystem integration
KadiFS
based
on
FUSE
. Connect your computers and devices to
directly interface
with the
Kadi4Mat ecosystem.
Source code
Citing
Main publication:
Brandt, N., Griem, L., Herrmann, C., Schoof, E., Tosato, G., Zhao, Y., Zschumme, P. and Selzer, M., 2021.
Kadi4Mat: A Research Data Infrastructure for Materials Science. Data Science Journal, 20(1), p.8. DOI:
Managing FAIR Tribological Data Using Kadi4Mat (2022)
Generating FAIR research data in experimental tribology (2022)
Structured Data Storage for Data-Driven Process Optimisation in Bioprinting (2022)
KadiStudio: FAIR Modelling of Scientific Research Processes (2022)
KadiStudio use-case workflow: Automation of data-processing for in situ micropillar compression
tests (2023)
Strukturiertes Management von Forschungsdaten in den Ingenieurwissenschaften (2024)
Instances
While everyone can
deploy
an instance of Kadi4Mat themselves, listed below are all instances of Kadi4Mat currently hosted at the KIT:
Demo
Public demo instance of Kadi4Mat, with experimental features enabled.
Note that this instance is automatically reset at the beginning of each month.
KIT+
Kadi4Mat instance for use at
KIT
and for projects/cooperations, including
FestBatt
AQua
, the
CRC 1574
, and more.
POLiS
Kadi4Mat instance for use in the
Post Lithium Storage Cluster of Excellence (POLiS)
Contact
Karlsruhe Institute of Technology (KIT)
Institute of Nanotechnology (INT)
Michael Selzer
Kadi4Mat at the
IAM/INT
Feedback may also be sent to
feedback-kadi4mat
lists.kit.edu