"Garbage In, Garbage Out": AI's Data Bias Problem

FROM OUR BLOG

"Garbage In, Garbage Out": AI's Data Bias Problem

Apr 10, 2025

The term “garbage in, garbage out” rings especially true when training AI. Data biases are known to adversely affect models, leading to inaccurate, incorrect, or unfair outputs. This in turn erodes trust, amplifies existing biases, and ultimately limits real-world adoption.

What are the Primary Types of AI Bias?

Cognitive bias: Human judgement errors that can seep into and influence AI systems.
Implicit bias: Subconscious stereotypes or prejudices affecting data collection or quality.
Measurement bias: Inaccuracies introduced by flawed methods of collecting or labeling data.
Selection bias: Results from training data that is not representative of the broader population or intended use case.

Real-World Examples of AI Bias

Even the most well-designed models can fail when built on flawed foundations. Here are two recent cases demonstrating the effects of AI bias:

Gender Bias in Liver Disease Screening Tools

Four AI models designed to detect liver disease appeared over 70% accurate when assessing patients. But upon recreating the study, researchers found the models missed 44% of cases among women compared to just 23% among men. The two models with the highest overall accuracy also had the widest gender gap, revealing how surface-level performance can mask deeper bias.

Deepfake Detection and Error Disparities

Even deepfake detection technology designed to safeguard against digital manipulation shows disparities in performance. Research conducted by University of Buffalo researchers demonstrated that deepfake detection algorithms had higher error rates when analyzing videos of darker-skinned individuals compared to lighter-skinned ones (up to a 10.7% difference). This discrepancy raises concerns about the technology’s reliability and its potential to disproportionately misidentify or overlook harmful content targeting minority communities.

The Aris Solution: Mitigating Biases with High-Quality, Representative Datasets

AI is, at its core, a reflection of the data it is trained on. The massive datasets used to train models reflect the biases, inequalities, and imperfections present in human society. If companies fail to train models on fresh, representative datasets, these biases will persist as AI systems continue to be implemented in our everyday lives.

This is where Aris comes in. Aris provides AI companies with human-generated, high-quality, multimodal datasets that support the mitigation of algorithmic bias. Aris collects a wide range of demographically and geographically representative data, empowering AI companies to create models that reflect the complexity of the world they serve.

At Aris, we’ve helped leading AI companies reduce bias by supplying balanced and representative datasets. Explore our case studies to see how Aris is setting a new standard for AI accountability and responsibility.

What are the Primary Types of AI Bias?

Cognitive bias: Human judgement errors that can seep into and influence AI systems.
Implicit bias: Subconscious stereotypes or prejudices affecting data collection or quality.
Measurement bias: Inaccuracies introduced by flawed methods of collecting or labeling data.
Selection bias: Results from training data that is not representative of the broader population or intended use case.

Real-World Examples of AI Bias

Even the most well-designed models can fail when built on flawed foundations. Here are two recent cases demonstrating the effects of AI bias: