Feb 6 / Naz

How do we get spammy ideas into the AI?

(One) Garbage in. (Millions of)
Garbage Out.
Linda Gottfredson and Richard Lynn, as the controversial thinkers in our “real world,” were not that famous and haven't significantly affected many of us directly. Perhaps their influence is more pronounced among their followers in scientific racist networks. However, their questionable ideas are impacting us, our children, and our lives as we engage with AI technologies, particularly GenAI solutions.
"Academic freedom is essential, yet it demands responsible use, especially in areas such as race and intelligence research.
Linda Gottfredson's studies, partially funded by the Pioneer Fund, which is known for supporting scientific racism, underscore the need for rigorous ethical and scientific scrutiny. These factors necessitate a critical assessment of the motivations and potential societal impacts of such research. Understanding the implications of this work is crucial, especially concerning how it might influence social policies and the pursuit of racial equality."
To comprehend how controversial or "problematic thoughts" might become part of the data used to train AI systems, it's important to explore the mechanics of AI training and the nature of the data used.
AI systems, particularly those based on machine learning and natural language processing, are trained using extensive datasets. These datasets typically consist of a vast array of texts from the internet, including books, articles, websites, and other public domain materials. The objective is to expose the AI to a wide spectrum of human language, encompassing various styles, contexts, and opinions, enabling it to understand and generate human-like responses.
However, this broad exposure also means that AI systems can encounter biased, incorrect, or controversial content. When such content is part of the training data, the AI may learn to replicate similar patterns in its responses. It's crucial to note that the inclusion of such content doesn't imply endorsement by the AI or its developers; rather, it reflects the diversity and complexity of human thought and language.
To mitigate the risks associated with harmful or misleading content, AI developers use several strategies:
Curating Training Data: Developers often curate the training data to exclude or minimize exposure to harmful, biased, or low-quality content, using sophisticated algorithms and manual review processes.
 Algorithmic Adjustments: Making adjustments and improvements to the AI's algorithms can assist in identifying and avoiding the replication of problematic content.
Ethical Guidelines and Policies: Setting ethical guidelines and policies for AI development and usage is critical. This includes guidelines on handling sensitive topics and the type of content that should be avoided.
Continuous Monitoring and Updates: AI systems are continually monitored and updated to ensure they align with ethical standards and societal norms. This includes retraining the AI with new, cleaner datasets and refining response generation mechanisms.
Transparency and Accountability: Ensuring transparency in AI training processes and maintaining accountability for the outcomes of AI interactions is vital.
Community Feedback: Input from users and the community can help identify areas where the AI might be underperforming or replicating undesirable content, allowing for targeted improvements.
For more detailed insights into these processes and the challenges involved, exploring "Artificial Intelligence: A Guide for Thinking Humans" by Melanie Mitchell and "Rebooting AI: Building Artificial Intelligence We Can Trust" by Gary Marcus and Ernest Davis might be beneficial. These books offer a comprehensive overview of how AI systems are trained and the complexities involved in ensuring they are ethical and unbiased.
Free Resource: 
Created with