Select date

July 2024
Mon Tue Wed Thu Fri Sat Sun

Behind the scenes of our Perplexity investigation

30-6-2024 < Attack the System 20 415 words
 
I was sitting in WIRED’s New York offices several weeks ago, catching up on emails and feeling that slightly unpleasant itch in the back of my brain that happens when we haven’t published a big story recently, when WIRED editor Tim Marchman knocked on my door. “Hey,” he said. “Do you happen to know how we can get in touch with Condé Nast’s engineering team?”

That was the catalyst for some of the more clever reporting we’ve published at WIRED in recent months: a deep dive into Perplexity, a popular, AI-infused search startup with a billion-dollar valuation and backers including Jeff Bezos’ family fund and Nvidia. Perplexity is also, as Marchman and senior writer Dhruv Mehrotra concluded, something of a “bullshit machine” that spits out inaccurate, sometimes fancifully fabricated information—including false but incriminating claims about specific people. The company, Marchman and Mehrotra found, appears to be surreptitiously scraping web content—including WIRED.com journalism—using a secret IP address and despite the company’s claims that it would honor do-not-scrape requests communicated by website operators via robots.txt files.


The latter finding is where engineers for Condé Nast, the parent company of WIRED, came in. With their collaboration, Marchman and Mehrotra gained access to WIRED and Condé Nast server logs, and found that the IP address 44.221.181.252 had hit Condé Nast websites, including WIRED, at least 822 times in recent months. That same IP address was observed by web developer Robert Knight when he dug into Perplexity’s access to MacStories.net, a site on which he works. We further validated that the IP address was linked to Perplexity by creating a new website, monitoring its server logs, and prompting the Perplexity chatbot to summarize the site. Boom. 44.221.181.252.


What does that mean in practice? That Perplexity can produce summaries of WIRED journalism using language that skews uncomfortably close to our writers’ words and phrases. (Perplexity did this again by plagiarizing our own story about its methods. This week, the company appeared to stop scraping our sites without permission.) Had Perplexity adhered to digital best practices and honored robots.txt as the company stated it did, the search tool shouldn’t have been able to summarize and closely mimic our work.


Perplexity’s public reckoning isn’t over yet. Following our initial reporting, WIRED broke news about Amazon looking into Perplexity’s activity, and how experts think the company may have exposed itself to legal claims. What happens from here? That’s anybody’s guess—but you’ll read about it first on WIRED.com.


Print