The controversy arose from an investigation that revealed several major tech companies, including Apple, Nvidia, and Anthropic, had used a dataset known as “the Pile” compiled by the non-profit organization EleutherAI. This dataset included subtitles from over 170,000 YouTube videos, harvested without the consent of the content creators. The dataset was used to train various AI models, sparking significant ethical concerns.
Apple’s Response
Apple has firmly stated that its Apple Intelligence AI was not trained using the OpenELM model or the controversial YouTube subtitles dataset. Instead, Apple used the “Pile” dataset, which includes the YouTube subtitles, to train its open-source OpenELM models. These models were released in April and are intended solely for research purposes, aimed at advancing open-source large language model development.




