Simplified Specification of Data Requirements for Demand-Actuated Big Data Refinement

, , , , and .
Journal of Data Intelligence, 3 (3): 366-400 (August 2022)
DOI: 10.26421/JDI3.3-5


Data have become one of the most valuable resources in modern society. Due to increasing digitalization and the growing prevalence of the Internet of Things, it is possible to capture data on any aspect of today's life. Similar to physical resources, data have to be refined before they can become a profitable asset. However, such data preparation entails completely novel challenges: For instance, data are not consumed when being processed, whereby the volume of available data that needs to be managed increases steadily. Furthermore, the data preparation has to be tailored to the intended use case in order to achieve an optimal outcome. This, however, requires the knowledge of domain experts. Since such experts are typically not IT experts, they need tools that enable them to specify the data requirements of their use cases in a user-friendly manner. The goal of this data preparation is to provide any emerging use case with demand-actuated data. With this in mind, we designed a tailorable data preparation zone for Data Lakes called BARENTS. It provides a simplified method for domain experts to specify how data must be pre-processed for their use cases, and these data preparation steps are then applied automatically. The data requirements are specified by means of an ontology-based method which is comprehensible to non-IT experts. Data preparation and provisioning are realized resource-efficient by implementing BARENTS as a dedicated zone for Data Lakes. This way, BARENTS is seamlessly embeddable into established Big Data infrastructures. This article is an extended and revised version of the conference paper "Demand-Driven Data Provisioning in Data Lakes: BARENTS - A Tailorable Data Preparation Zone" by Stach et al. In comparison to our original conference paper, we take a more detailed look at related work in the paper at hand. The emphasis of this extended and revised version, however, is on strategies to improve the performance of BARENTS and enhance its functionality. To this end, we discuss in-depth implementation details of our prototype and introduce a novel recommender system in BARENTS that assists users in specifying data preparation steps.



  • @christophstach

Comments and Reviews