Ziawasch Abedjan

Data cleaning is an important problem in the integration of large amounts of data in order to make data usable for an application or an analysis. There are several cleaning procedures, each of which can fix a specific category of errors or data quality problems. Usually one has to use several methods one after the other to achieve the desired data quality. Determining the selection and sequence of these procedures is a lengthy and laborious manual process.
The aim of the present project is to propose new data cleaning procedures by considering previous data cleaning procedures that were successfully carried out on similarly structured and dirty data.
The challenges here are
Our approach is based on existing techniques of To test "data profilings" and "effort estimation" with regard to the sensible creation of dataset profiles and to find out which dataset profiles can be used to describe and compare the data quality of a dataset.