About the Big Data Analytics research direction
The Big Data research subprogramme focuses on the research in efficient big data analysis, with special focus on identifying the limits of existing data analysis techniques, and guidance for the selection of the most efficient data analysis technique to each specific problem setup. Besides the benchmarks and comparative studies of existing data analysis techniques, new methods might be proposed to optimize the utilization of the infrastructure.
In the Big Data area, CERIT-SC aims to identify and validate the effectiveness and limitations of existing data analysis techniques when being applied to different datasets under various setups (e.g. given by the research question being answered for the specific dataset). Extensive attention is paid to extremely large datasets, which often require very simplistic techniques leveraging between analysis feasibility (e.g. response time, amount of used resources) and achievable information value. The example domains include large networks of interconnected sensors (present for instance within the concept of Internet of Things), cybersecurity assurance and (cyber)crime detection (dealing with vast amounts of heterogeneous data), as well as various bioinformatics data portals and analyses (e.g., genome DNA/RNA sequencing and analyses).
Research topics
- Examination of existing data analysis algorithms and techniques
- Comparison of existing data analysis tools
- Architecture of data analysis infrastructure (tool composition)
- Solutions to domain-specific data analysis problem
- Domains of Smart Grids, Bioinformatics, cybercrime, and others.
Application domains
Big data analysis applies to essentially all domains where big data is present. In our case, we specialise on two domains, with strongest link to other partners and projects. The first is IoT (Internet of Things systems), and Smart energy grids in particular. The second is data describing biological problems and samples.
Tools
At the moment, we work mostly with publicly available tools, which we are able to configure to best match the addressed problem. These tools are mainly Elasticsearch a Hadoop.
Results
[1] Bangui, H., Ge, M., Buhnova, B., Rakrak, S., Raghay, S., & Pitner, T. (2017). Multi-Criteria Decision Analysis Methods in the Mobile Cloud Offloading Paradigm. Journal of Sensor and Actuator Networks, 6(4), 25.
This work is supported by the project OP RD&E CERIT Scientific Cloud CZ.02.1.01/0.0/0.0/16_013/0001802