Data Risks in AI

Data is key for the training, testing and cross-validation of machine learning systems. However, improper collection and use of data presents significant risks for the unwary. These risks may arise from the nature of the data itself and the sources from which it is derived. For example, if a dataset contains personal data, data protection regulations must be complied with. It is also hugely important to identify the sources of data used in large datasets.

Licensed data includes data obtained with a license for use from a third party dataset; or data obtained from customers via licenses in the customer terms and conditions. These are typically low risk data. Unlicensed data includes data obtained by data mining/scraping of publicly available material; or data collected from customers and third parties without a license under customer terms and conditions . These forms of data may carry more legal risk.

For example, scraped data may include copyright works whose use without the consent of the rightsholder constitutes copyright infringement. Similarly, website terms and conditions may prevent the use of the website’s content for commercial purposes.

Beyond the above legal risks, AI providers will soon be required to comply with significant AI governance obligations under Article 10 of the EU AI Act. These will extend beyond GDPR obligations and will address topics such as the intended use, composition, pre-processing, labelling, and quality of the data (including biases and gaps).

Keeping track of compliance with these extensive requirements will be extremely challenging for AI providers in today’s fast-paced, competitive and data-hungry AI world.

Progressio.AI are the technology partners to close this gap.

Navigate Data Protection Laws
Safeguard Intellectual Property Rights
Comply with AI Regulations Efficiently