Good Search Borrows, Great Search … Steals?
Web crawling—the act of indexing information across the internet—has been around for decades. It has primarily been used by search engines like Google and nonprofits like Internet Archive and Common Crawl to catalog the contents of the open internet and make it searchable. Until recently, the practice of web crawling has rarely been seen as controversial, as websites depended on the process as a way for people to find their content. But now crawling tech has been subsumed by the great AI-ening of everything, and is being used by companies like Google and Perplexity AI to absorb whole articles that are fed into their summarizing machines.
This week on Gadget Lab, WIRED senior writer Kate Knibbs joins the show to talk about web crawling and the controversy over Common Crawl. Then we talk with Forbes’ chief content officer and editor Randall Lane about how Perplexity.AI repurposed a Forbes article and presented it as its own story, without first asking permission or properly citing the source.
Show Notes:
Read Kate’s story about how publishers are going after Common Crawl over AI training data. Read Randall’s story about how Preplexity.AI copied the work of two Forbes reporters.
Recommendations:
Randall recommends his new horse racing league, the National Thoroughbred League. Kate recommends the book Victim by Andrew Boryga. Lauren recommends the show Hacks on Max.
Randall Lane can be found on social media @RandallLane. Kate Knibbs is @Knibbs. Lauren Goode is @LaurenGoode. Michael Calore is @snackfight. Bling the main hotline at @GadgetLab. The show is produced by Boone Ashworth (@booneashworth). Our theme music is by Solar Keys.