Machine learning algorithms are becoming more and more important in everyday life. Applications in search engines, driver assistance systems, consumer electronics, and so on use them heavily and would not be as powerful without them. Neural Networks (NNs), for example, are state-of-the-art classification approaches and dominate the field. However, they are difficult to interpret and not fully understood. For instance, the existence of adversarial examples that are imperceptible to humans contradicts the general belief that convolutional NNs classify objects in images mainly by breaking them down into increasingly complex object shapes. In this thesis, we study prototype-based classification algorithms with the goal of improving the classification capabilities of such algorithms while simultaneously preserving robustness and interpretability properties. Moreover, we investigate how properties of prototype-based classification algorithms can be transferred to NNs in order to increase their interpretability. First, we derive the concept of set-prototypes and apply it in a Learning Vector Quantization (LVQ) framework—a well-understood classification algorithm. We examine the mathematical properties and show that the derived method is provably robust against adversarial attacks. Furthermore, the method consistently outperforms other LVQ approaches while still being interpretable. Second, we relax the class-specific prototype concept to that of components and apply it in LVQ- and NN-based classifiers. This framework provides promising interpretation techniques for NNs. For example, we use them to explain how an adversarial attack is fooling an NN. We evaluate the methods on both toy and real-world datasets, including Indian Pine, MNIST, CIFAR-10, GTSRB, and ImageNet.
Das Dokument ist frei verfügbar