
Google announced the release of Google-Extended to give web creators more control over their proprietary data and prevent the company from using this data to train its generative AI offerings.
Google-Extended is a standalone product token that web publishers can use to manage whether their sites help improve Bard and Vertex AI generative APIs, including future generations of models that power those products.
The search engine component of Google uses a Googlebot to crawl and index webpages, but Google was also crawling these webpages to gather the data used to train its artificial intelligence model.
Since Google is the search engine of choice for approximately 90% of all users, this forced many companies to make a difficult choice between being found by clients or possibly making it easier for AI to emulate your content.
This is particularly troublesome for companies with large content output, such as news companies like The Washington Post or the New York Times.
Google-Extended segregates the work of its GoogleBot to index pages effectively allowing companies to choose to opt in for search indexing, and opt out for AI training.
OpenAI had recently announced a similar opt out feature.
This comes as more and more content creators have joined class action lawsuits to prevent their copyrighted material from being used.
The New York Times updated its terms of use to prevent articles from being used for AI training prior to Google releasing Google-Extended as an option.
What remains unclear is how Google, OpenAI, and others will remove and clean out all the learnings that they have already used to get their engines to where they are today.
Leave a Reply