Abstract

We study the performance of different pool-based Batch Mode Deep Active Learning (BMDAL) methods for regression on tabular data, focusing on methods that do not require to modify the network architecture and training. Our contributions are three-fold: First, we present a framework for constructing BMDAL methods out of kernels, kernel transformations and selection methods, showing that many of the most popular BMDAL methods fit into our framework. Second, we propose new components, leading to a new BMDAL method. Third, we introduce an open-source benchmark with 15 large tabular data sets, which we use to compare different BMDAL methods. Our benchmark results show that a combination of our novel components yields new state-of-the-art results in terms of RMSE and is computationally efficient. We provide open-source code that includes efficient implementations of all kernels, kernel transformations, and selection methods, and can be used for reproducing our results.

Links and resources

Tags