Augmented Intelligence using for Data Preparation

Transformer-based models (e.g., BERT, RoBERTa, XLNet) and giant language models (e.g., GPT-3 and T0pp) have a good potential to learn knowledge from multi-modal data, such as text, tables, and so on. This learned knowledge, if being used appropriately, can significantly help practitioners reduce human cost in terms of laborious data preparation tasks. This project aims at solving multiple data preparation tasks using these models, with two main goals: improved machine accuracy and reduced human cost. We will work on augmented intelligence using language models on three main data preparation tasks: (1) data cleaning, (2) entity resolution, and (3) querying/managing data lakes.