We support the development of open-source software in research. We make several of our research tools and datasets publicly available, such that the scientific community can reproduce our results and further advance our work.
This project studies image-scaling attacks, a new form of attacks that allow an adversary to manipulate images, such that they change their content during downscaling. Image-scaling attacks are a considerable threat, as scaling is omnipresent in computer vision. Moreover, these attacks are agnostic to the learning model and training data, affecting any learning-based system operating on images.
In this project, we study techniques of explainable machine learning in security applications. We find that the explanations generated by these techniques can differ significantly depending on the security task and learning model. At the same time, it is unclear how explanations can be compared in order to decide if one method is “better” than another one. As a result, we devise novel critera for comparing and evaluating explanations methods in computer security.
In this project, we attack methods for authorship attribution of source code using adversarial learning. We exploit that these methods rest on machine learning and thus can be deceived by adversarial examples of source code. Our attack performs a series of semantics-preserving code transformations that mislead the attribution but appear plausible to a developer. Our attack and the datasets are publicly available.
In this research project we explore similarities between machine learning and digital watermarking under attack. As part of the project, we have developed a unified view on attacks in both domains and created a framework for modeling evasion and poisoning attacks. The code and datasets of our case studies are publicly available.
Joern is a tool for robust analysis of C/C++ code. It generates abstract syntax trees, control flow graphs and searchable indexes of code constructs. It has been specifically designed to meet the needs of code auditors, who often find themselves in a situation where constructing a working build environment is not feasible. Joern enables one to write quick-and-dirty but language-aware static analysis tools.
Pulsar is a network fuzzer with automatic protocol learning and simulation capabilites. The tool allows to model a protocol through machine learning techniques. The learned models can be used to simulate communication between Pulsar and a real client or server which, in combination with a series of fuzzing primitives, enables to test the implementation of an unknown protocol for errors in deeper states of its state machine.
The Drebin dataset consists of roughly 5,000 malicious Android applications that have been collected as part of the Mobile Sandbox project between 2010 and 2012. The dataset can be used to experiment with Android malware and compare different detection approaches.
Adagio is a collection of Python modules for analyzing and detecting Android malware. These modules allow to extract labeled call graphs from Android APKs or DEX files and apply an explicit feature map that captures their structural relationships. Additional modules provide classes for designing binary or multiclass classification experiments and applying machine learning for detection of malicious structure.
Malheur is a tool for the automatic analysis of program behavior recorded from malicious software (malware). It has been designed to support the regular analysis of malicious software and the development of detection and defense measures. Malheur allows for identifying novel classes of malware with similar behavior and assigning unknown malware to discovered classes using machine learning.
Harry is a small tool for comparing strings and measuring their similarity. The tool supports several common distance and kernel functions for strings, such as the Levenshtein (edit) distance, the Jaro-Winkler distance and the compression distance. Harry is implemented using OpenMP, such that its runtime scales linear with the number of available CPU cores.
Sally is a small tool for mapping a set of strings to a set of vectors. This mapping is referred to as embedding and allows for applying techniques of machine learning and data mining for analysis of string data. Sally can applied to several types of string data, such as text documents, DNA sequences or log files, where it can handle common formats such as directories, archives and text files.
Salad is an efficient and flexible implementation of the well-known anomaly detection method Anagram. The method uses n-grams (substrings of length n) maintained in a Bloom filter for efficiently detecting anomalies in large sets of string data. Salad extends the original method by supporting n-grams of bytes and words as well as training with two classes.
Jailbreaks remove vital security mechanisms, which are necessary to ensure a trusted environment that allows to protect sensitive data, such as login credentials and transaction numbers (TANs). We find that all but one banking apps, available in the iOS App Store, can be fully compromised by trivial means without reverse-engineering, manipulating the app, or other sophisticated attacks.
We have conducted a thorough security analysis of so-called HomePlug devices by Devolo, which are used to establish network communication over power lines. We have identified multiple security issues and find that hundreds of vulnerable devices are openly connected to the Internet across Europe. 87% run an outdated firmware, showing the deficiency of manual updates in comparison to automatic ones.