The Dark Side of AI Detectors: Why Accuracy Is Not Guaranteed

As more writers use generative artificial intelligence (GenAI) to create, some editors are taking steps to make sure they don’t. This often comes in the form of AI detectors. It’s understandable that editors want to guarantee accuracy and avoid machine-content generated that may lack credibility. Plus, they don’t want to lose the human touch that resonates with their readers.

Unfortunately, AI checkers, designed to identify AI-written content, are far from perfect. They often miss the mark, flagging genuine, human-generated content while letting some AI-written material slip through. Some universities, including Vanderbilt, Michigan State, and the University of Texas at Austin, have disabled AI detection software based on concerns over accuracy.

The question is: Are AI checkers even necessary? If content is fact-checked, properly attributed, and aligned with business goals, does it matter whether AI was used in the process?

AI Detectors Explained

AI detectors are tools designed to analyze written content and determine whether it was created by a human or an AI system. They examine patterns in language, sentence structure, word choice, and even tone to identify characteristics typical of machine-generated content.

These tools use algorithms to compare the content against a database of known AI-generated text, looking for similarities that suggest it wasn’t written by a person.

According to Surfer SEO, AI detectors use four techniques: classifiers, embeddings, perplexity, and “burstiness.”

Classifiers sort text into categories by recognizing patterns they’ve learned, while embeddings turn words into numbers to show how they are related in meaning. Perplexity measures how predictable the text is—higher perplexity usually means a human wrote it. Burstiness looks at how varied the sentences are, with human writing typically showing more variety.

Limitations of AI Detectors

AI detectors can be helpful tools, but they come with significant limitations that every business and writer should understand. “AI detectors don’t understand language as well as humans do,” says Petar Marinkovic of SurferSEO. “They only rely on historical data from their training sets to make predictions as confidently as possible.”

False Positives/Negatives

One of the limitations of AI detectors is the issue of false positives and false negatives. False positives occur when AI checkers incorrectly flag human-written content as AI-generated. Dr. Anneke Schmidt, a content strategist and SEO consultant, conducted her own research on AI detectors and found that to avoid false positives, users might need to insert mistakes or formatting issues deliberately. For businesses and marketers, false positives can create unnecessary roadblocks, leading to time-consuming revisions or unjustified concerns over content quality.

Dr. Schmidt notes that AI detectors tend to flag more complex or academic content, while simpler styles are less likely to be mislabeled. This creates a dilemma for businesses that produce high-level, technical content—they shouldn’t have to simplify or reduce the quality of their work just to pass an AI detection test.

The limitations of AI detectors aren’t confined to text alone. They also extend to image detection. As AI-generated visuals become more sophisticated, distinguishing between real and synthetic graphics has grown increasingly difficult.

The New York Times conducted a test using over 100 synthetic and real images to evaluate the performance of AI image detection services. Their findings were clear: “The results show that the services are advancing rapidly, but at times fall short.”

Much like text detectors, image detectors rely on algorithms trained on historical data to spot signs of manipulation or AI generation. However, as AI technologies evolve, so do the techniques for creating highly realistic images. This makes it harder for these tools to keep up.

Accuracy Issues

The issue of AI detector accuracy isn’t new. A study by the International Journal for Educational Integrity found that “AI detection tools were more accurate in identifying content generated by GPT 3.5 than GPT 4. However, when applied to human-written control responses, the tools exhibited inconsistencies, producing false positives and uncertain classifications.” In July 2023, OpenAI discontinued its AI Classifier tool after it achieved a success rate of just 26 percent.

As an experiment, we ran the Constitution of the United States through a popular AI detector. The results were that the document was 98.53 percent likely AI-generated. This could lead some to believe that the more times content appears “out there” or the more times content is entered into a detector, the more likely it is that it will be flagged as machine-generated.

We also asked ChatGPT if submitting different versions of the same content into AI detectors will send up red flags. The answer was a resounding yes, citing pattern recognition, repeated submissions, and model-specific traits as contributing factors.

Bias in Detection

Further research by the European Network for Academic Integrity (ENAI) revealed another concerning issue with AI detectors: bias. After analyzing 12 publicly available tools, the researchers concluded that these tools are not only inaccurate but also tend to wrongly classify content as human-written rather than accurately detecting AI-generated text.

Another concern is the detectors’ inherent bias toward non-native English speakers. Studies have shown that their text is more likely to be labeled as machine-written. The use of grammar, spelling, and editing tools—often used by native and non-native English speakers alike—may also trigger AI detectors.

AI with Human Expertise

As the University of Kansas Center for Teaching Excellence says, “As tempting as it might be to use [AI] detector as a shortcut, you should not. The tool provides information, not an indictment.” Ultimately, AI detectors have a long way to go before they can be considered reliable. Businesses, writers, editors, and readers need to understand these limitations and carefully consider how much weight to place on AI detection tools. Instead of depending solely on AI detectors, a balanced approach combining human oversight with AI support will likely yield the best results. As Dr. Schmidt suggests, “Trust your instincts. That’s what sets us apart from the robots.”