The policy update from July 1st, 2023, indicates that Google leverages this information to enhance services, develop new features, products, and technologies for the benefit of users and the public. The company also may "use publicly available information to help train Google’s AI models and build products and features like Google Translate, Bard, and Cloud AI capabilities.”
The updated policy specifies that Google uses "publicly available information" for training its AI products, but does not clarify how the company intends to prevent the inclusion of copyrighted materials in the data pool. Numerous websites that are publicly accessible have regulations that prohibit data collection or web scraping for training large language models and other AI tools. This stance also raises questions about how this data is processed to avoid contributing to potential AI system failures.
Moreover, the gray area of whether the fair use doctrine extends to this type of application has triggered lawsuits and prompted lawmakers in certain countries to propose stricter laws to regulate how AI companies gather and use their training data.
Meanwhile, Gannett, the largest newspaper publisher in the US, is suing Google and its parent company, Alphabet, alleging that advancements in AI technology have aided the search giant in monopolizing the digital ad market. Products like Google's AI search beta have also been criticized as "plagiarism engines" that deprive websites of traffic. Simultaneously, Twitter and Reddit have taken significant measures to prevent other companies from freely harvesting their data, although these actions have negatively affected the user experience on their platforms.