A Real Discussion About Artificial Intelligence

A Real Discussion About Artificial Intelligence

Artificial Intelligence is a broad term that could do with some definitions. With the arrival of OpenAI’s ChatGPT in November 2022, AI has exploded in the popular consciousness, promising to revolutionize everything, and fraudulent actors have taken advantage of the term’s nebulousness to mislead the masses for their ends.

This article summarizes and adds to a recent book by Arvind Narayanan and Sayash Kapoor titled AI Snake Oil (Princeton University Press, 2024) offering tools to separate the AI wheat from the chaff. The book’s authors share one of the consequences of not making good decisions about AI:

If [Gen AI] leads to an economic transformation […], it probably won’t be good news for labor, whether or not it eliminates jobs. That’s because this type of AI relies on the invisible, drudging, low-wage work of millions of people to create training data, as well as the use of data found on the web without credit or compensation to the writers, artists, and photographers who created them. In the wake of the Industrial Revolution, millions of new jobs were created in factories and mines, with horrific working conditions. It took many decades to secure labor rights and improve workers’ wages and safety. Similarly, there is a movement today to secure labor rights, human creativity, and dignity in the face of encroaching automation.

The most common forms of AI presently are Generative AI, which includes chatbots and image generators, and predictive AI, which is used to guide decision-making with broad statistical patterns (e.g., hospital stay durations, hiring decisions based on 30-second video interviews, etc.). Much of the false promises of AI fall into the second category. There are also other types of AI that we’ll get into shortly.

Here are some guiding questions for deciding whether something is or has artificial intelligence:

  1. Does the task require creative effort or training for a human to perform?
  2. Was the behavior of the system directly specified in code by the developer, or did it indirectly emerge, by learning from examples or searching through a database?
  3. Does the system make decisions autonomously and possess some degree of flexibility and adaptability to the environment?

Despite these and other guides for determining what is and isn’t AI, there remains a great deal of confusion and disagreement (thus enabling the bad actors).

The false promises and nebulousness of AI aren’t the only issues with AI. Sometimes AI works too well, and this causes problems too. For example, facial recognition technology, which is particularly advanced, can cause great harm in the hands of the wrong people.

Some examples of AI failing to live up to the hype –

  1. Claim: machine learning (ML) could predict hit songs with 97% accuracy. Reality: ML did no better than random guessing
  2. Reality: Vast majority of ML based research across disciplines was discovered to be flawed 
  3. Claim: In June 2022, Google’s internal chatbot had become sentient. Reality: Most AI researchers don’t think there is any basis for the claim

Predictive AI

Some universities use predictive AI, like EAB Navigate, to decide which students are at risk of dropping out. Even though the predictions are highly error-prone, this information is used to deny admission or pressure students to leave.

These AIs rely on algorithms. Formerly, algorithms would have been manually set by humans to automate decisions, like differing COVID-19 stimulus checks to citizens based on their tax records. These days, algorithms (or models trained with data) develop rules from patterns in past data, e.g., your Netflix viewing habits. No one manually sets the rules to recommend shows that are like the ones you’ve watched before. An algorithm developed and applied it automatically. This kind of predictive algorithm is currently used to assess potential hires, offer loan amounts, decide how long someone should be in jail, and much more. When applied indiscriminately, it ruins people’s lives.    

A study was done on whether predictive AI could predict higher-risk pneumonia patients admitted to the hospital from lower-risk ones. After being trained, the model was fairly accurate but also made baffling decisions such as sending patients with asthma back home (rather than prioritizing them). It turned out that the data the model was trained on could not account for the standard hospital procedure that patients admitted with asthma were sent straight to the ICU. Such patients received more intensive care than non-asthmatic patients and were therefore less likely to develop complications. Deploying this model indiscriminately (i.e., sending patients with asthma and pneumonia home) would have been disastrous. The model’s good predictions did not necessarily lead to good decisions.

Because predictive AI relies on past data, it is forced to choose an outcome based on data that it is trained on. It cannot adjust to changes in human behavior. This leads to people gaming the system. Many companies in the US use predictive AI to hire people. Candidates, knowing this, have started to include keywords from their job application descriptions and the names of top universities in white text (invisible to the human eye but visible to software). Investigation from journalists also revealed that random changes to ones resume, such as converting the digital document from PDF to plain text, can drastically change a candidate’s score. When a model’s output can be easily manipulated by superficial modifications, claims of accuracy should be thrown out.

In the Netherlands, an AI was used to detect welfare fraud based on correlations in data. This was previously done by people. Result: thirty thousand people were falsely accused of welfare fraud and had to pay in some cases over a hundred thousand euros to the government. People of Turkish, Moroccan, or Eastern European nationality were more likely to be flagged. The country used it for six years, and when the algorithm’s shortcomings came to light, the government was fined 3.7 million euros by the Dutch data protection watchdog. Soon after, the PM and his entire cabinet resigned.

When AI is trained on data that is not reflective of the application context, bad things can also happen:

One example comes from Allegheny County, Pennsylvania. In 2016, the county adopted the Allegheny Family Screening Tool to predict which children were at risk of maltreatment. The tool is used to decide which families should be investigated by social workers. Through these investigations, social workers can forcibly remove children from their families and place them in foster care. The tool relies on public welfare data, which consists mainly of data on poorer parents who use public services such as Medicaid – funded clinics. Notably, it doesn’t include information about people who use private insurance. One example comes from Allegheny County, Pennsylvania. In 2016, the county adopted the Allegheny Family Screening Tool to predict which children were at risk of maltreatment. Models built using this data therefore cannot make decisions about rich parents who have never relied on public services. As a result, the tool disproportionately targets poorer families.

A common result of indiscriminate predictive AI use is harm to minorities and those in poverty.

Why doesn’t predictive AI work? Certain phenomena may be irreducibly complex. Predicting the weather, a physical system, has improved by about a day a decade since the mid-20th century. Today we have a good one-week forecast with the help of decades of simulations and vast computational power. This same computational power has not made much headway in predicting other phenomena such as life outcomes (e.g., a student’s GPA one year in advance or if someone will be evicted from their current home). Records are littered with such failed predictions. It may be that, with enough data, progress can be made in predicting complex phenomena. The number of data points needed, however, would likely be much more than all the data in the world (assuming such data could even be captured). If true, then in certain areas, accurate predictions may never be possible.  

Generative AI

There are useful and lifesaving applications of Gen AI, such as the app Be My Eyes, which allows blind people to take a picture of their surroundings and have it described to them by a chatbot.

Chatbots, however, are not sentient, as some sensationalists would have us believe. When they are speaking as if they are, they are parroting text on the internet about sentient AI. Certain chatbots such as ChatGPT will create fake information, such as when it created fake cases as precedents for a lawyer preparing a legal brief. Companion chatbots for the lonely and troubled have created problems too. One was discovered to have encouraged a man to commit suicide, which he did, after six weeks of talking to it. Image generators, the other part of Gen AI, train on gargantuan amounts of stock photography by photographers who have not been compensated.  

How did Gen AI come about? It started with an attempt to model a “neuron” in the 1950s. A computer was made to classify inputs in binary fashion, with a sequence of 400 numbers representing weights. Things advanced significantly in the following decades. Today’s neural networks have sequences of numbers, with corresponding weights for non-binary inputs, that go into the billions. In these networks, the sequences are arranged in layers, with each neuron connected to all neurons in the next layer.

While computationally demanding, especially with more and more layers, hardware companies serendipitously discovered that graphics processors used for video games (GPUs) were particularly good at computing these deep neural networks. See the recent rise of Nvidia as a major player in the AI space. It became a trillion-dollar company in 2023 because of this.

Nvidia’s 2025 GPU line-up. GPUs like these can perform trillions of arithmetic calculations per second.

How does image generation work? Train the final layer of a neural network to classify images, and soon it will be able to perform certain visual tasks, like identify a species of tree you took a picture of. As long as the lower layers have been trained to learn the visual structure of the world, the output at the final layer should work. The most common type of text-to-image technology is the diffusion model, which converts noise (i.e., a picture with random pixels) into a recognizable image with a text prompt. These models need large amounts of data.

Stability AI uses over 5 billion images from the internet to create Stable Diffusion. The artists who created those original images were not asked for consent nor have they been compensated. The copyright laws that allow this were last revised in 1978, before anyone could imagine technology being able to generate such images from human creative work.  

What about text? Text is different from images. While pixels are spatially oriented, with each pixel having some relationship with those near it, text comprises words and characters that also have long-range dependencies:

Early apps for language translation could do short sentences but struggled with longer ones for the above reason. Google found a way forward by brute forcing the computation with a big matrix that quantified the degree to which each word in the text was related to every other word. GPUs, readily available thanks to Nvidia, excel in these kinds of calculations. Matrices could somewhat represent the structure of language (but not the actual grammatical rules) and allow more complex information to pass through the layers of a neural network. The matrix is the reason ChatGPT can do what it currently does.

Now that it can classify complex text, how does it go on to generate its own text? Surprisingly, it’s also a brute force method: autocomplete on steroids. All modern chatbots are trained (by the astronomical amount of text that can be found online) to predict the next word in a sequence of words. It is quite unlike how humans think and speak. The chatbot has no coherent picture of its intended response to the prompt. It simply performs a trillion arithmetic operations to create the first word (or more accurately, first token, which is a series of characters longer than one letter and shorter than one word), and then another million billion calculations (depending on length) to complete its response. In equivalent man years, if the whole world were involved in manually making those calculations, it would take all of us 1 year of daily calculations to generate one response.

Because chatbots like ChatGPT are trained on what can be found online, they do not filter true from false information. This has led to many erroneous claims that have amused, misled, and most certainly harmed. Being a shameless plagiarizer, it also implicated CNET when the company used gen AI to write 77 articles that ended up resembling articles from competing websites and included multiple factual errors.  

Other documented harms are described by the book’s authors:

The U.S. state of Idaho released a public call for comments on its proposal to update Medicaid. The proposal received over 1,800 comments. Unbeknownst to the state agencies, 1,000 of these were autogenerated. Researchers used GPT-2, OpenAI’s 2018 text generation model, to submit seemingly real comments. Humans could not tell the difference. Researchers conducted this work ethically, and included a keyword in each response to identify which comments were autogenerated. But this study was prescient. As language models become widespread, so will automated bullshit.

  • AI generated voices to scam people or harm their reputation
  • Deepfakes sexualizing non-consenting people 
  • Addiction to screens, especially in children and young people
  • AI spewing hate speech and inappropriate content, remixing what it finds on the internet

The reason the chatbots we use don’t have this last problem anymore is because millions of people are hired as low-wage workers in lower-income countries (about USD 1.46/hour) to label these texts (and images) and keep them out of the chatbots. The work is so dreadful that companies resort to hiring prisoners and refugees. To improve the conditions of AI annotation work, three things should be done: (1) unionization, (2) transnational organization, and (3) solidarity between tech workers and low wage workers.

AI as Existential Threat

A common belief among some is that AI poses an existential threat to humanity. If an Artificial General Intelligence (AGI), i.e., an AI that can perform any action as effectively as humans, one day emerges, it could start to carry out its own AI research, improve itself indefinitely, exceed human abilities and one day supplant / destroy them.  

These concerns may be unfounded for several reasons. It is unclear how to achieve AGI or if AGI is even possible. The first general purpose computer was created after the realization, by Alan Turing and others, that different kinds of information like words, music, and pictures can be represented by sequences of ones and zeroes and stored as data in computers to be manipulated to perform tasks. Previously, machines could only do one kind of computation. This was the first step toward generality. Today, several more steps toward generality have been made. There are ML and deep learning neural networks that allow for more generalized computations with training data. The next steps include pretrained models that use a smaller training dataset to fine-tune an existing model for a task, and models that can have an inner monologue (internal reasoning process) before answering a question.

From here, no one knows how or when the next step to AGI will occur. It would minimally require large amounts of training data that reflect not just the physical world but the social world as well. It would also require inculcation of common sense and good judgment (even many humans don’t have this), something that may not be trainable, at least not to the level expected of an AGI worth the name.

The real concern should probably be misuse of AI by bad human actors. For example, the concentration of AI capabilities in the hands of a few could increase catastrophic risk. Top AI companies would wield inordinate power over policy and government, and any security vulnerabilities in their models can be exploited across the board, causing widespread damage. Other threats worth looking into are cybersecurity threats and the risk of AI-aided pandemics. Solutions should therefore be present-focused and targeted, rather than overly fixated on the future possibility of AGI emergence.

The Persistence of AI Myths

Epic, a healthcare company, used past data on health records of Americans to develop an AI that detects sepsis, saving time and money on equipment and data collection. The company did not publish any peer-reviewed evidence for its effectiveness, justifying its decision as protecting the IP of its proprietary model. Hundreds of hospitals adopted the system and no third-party assessment was conducted to verify its usefulness. Then, in June 2021, researchers from the University of Michigan Medical School released the first independent study of the model. The results were bad. Epic had claimed that its model had a relative accuracy between 76 percent and 83 percent. The study found that the relative accuracy was 63 percent, slightly better than random guessing.

Companies have vested interests in hyping their AI products, oftentimes beyond what their AI models are capable of. A calendar scheduling company advertised that its AI personal assistant could schedule meetings automatically, but actually tasked humans with reading and correcting errors in nearly every email generated by its AI scheduler. There are many such examples.

AI hype may be similar to other kinds of technology hype. Cryptocurrency is one. Web3, the suite of underlying technologies that enable crypto generated lots of hype. But like AI, the harms of crypto applications are enormous. Bitcoin mining alone consumes more energy than entire countries. In 2022, Crypto exchanges spent hundreds of millions of dollars to advertise their platforms. Soon after, Bitcoin’s value dropped 50 percent and exchanges like FTX went bankrupt. Customers lost over USD 11 billion. The CEO, Sam Bankman Fried, the poster boy for crypto, was convicted of fraud, conspiracy, and money laundering. The celebrities who promoted FTX also faced a class action lawsuit. Since cryptocurrencies are for the most part unregulated, victims of scams have little means to recover their loses. Between 2021 and 2023, over USD 50 billion was lost to such scams.

Likewise, the issue with AI hype stems from the mismatch between claims and reality. AI suffers from a reproducibility crisis:

In a 2018 study, Odd Erik Gundersen and Sigbjørn Kjensmo from the Norwegian University of Science and Technology set out to investigate the reproducibility of AI research. They reviewed four hundred papers from leading AI publications to ascertain if they contain enough detail to be reproducible by an independent researcher. They found that none of the four hundred papers satisfied all of the criteria (such as sharing their code and data) for reproducibility. Most papers satisfied merely twenty to thirty percent of the reproducibility requirements they identified, making it hard to even investigate if the results were reproducible.

The book’s authors conducted their own research on AI reproducibility. When predictive AI models were not evaluated on data they were trained on (aka teaching to the test), they performed no better at predictions than decades old models.

What’s Next

Predictive AI, Generative AI, and Content Moderation AI each have their promises and perils. How can individuals, institutions, and governments shape them in ways that benefit rather than harm?

We should first recognize that AI snake oil appeals to broken institutions desperate for quick fixes. If companies did not have broken hiring practices, would they resort to highly inaccurate predictive AI to assess candidates for them? If news companies are doing well financially, would they resort to chatbots to write hundreds of plagiarized articles riddled with factual errors? Would chronically underfunded and understaffed public schools in America use flawed cheating detection software to identify students who use chatbots to write their essays, resulting in an epidemic of false accusations? Dubious AI is disproportionately adopted by institutions that are underfunded or cannot effectively perform their roles, i.e., “broken.”

Flawed AI also diverts focus from the core goals of institutions. For instance, many colleges want to provide mental health support to students. But instead of building the institutional capacity to support students through difficult times, dozens of colleges adopted a product called Social Sentinel to monitor students’ social media feeds for signs of self-harm. The accuracy was so low that even an employee of the company internally called it snake oil. But that didn’t stop colleges from spending thousands of dollars on it. And instead of using the tool for preventing self-harm, some schools and colleges used it for surveillance and monitoring student protests.

Countries are moving quickly to regulate AI. The European Union have GDPR, DSA, DMA, and AIA. China have their own set of rules and regulations. Overzealous regulation curbs innovation and reduces competition. Regulation that is co-opted to serve a company’s interests, as in the case of OpenAI, is equally problematic. A middle way should be sought. Relevant stakeholders should actively seek to sort this out. Proper attention needs to be given to AI regulation.

Regarding AI’s impact on the job market, the role of unions and worker’s collective become more important. If existing labor protections are insufficient, other solutions like Universal Basic Income (UBI) could be explored. A “robot tax” could be implemented in AI companies and distributed to low-wage workers most impacted by such technologies. This could help divide the economic pie more equitably than unfettered market forces could permit.

At the individual level, be skeptical about claims of AI. Don’t buy the hype. Look at the evidence of a model’s effectiveness with e.g., research papers that indicate reproducibility, etc. Acknowledge that AI development has been and remains severely exploitative to low-wage and creative workers and the harms of misuse are real. The amount of CO2 produced by training and running AI is not good for the environment either. Support ethical AI regulation. Use precise terms when describing AI, e.g., large language models (LLM), predictive AI, content moderation AI, generative AI etc. so as not to equivocate on terms and have misleading discussions about their relative merits. Lastly, share this article / YT link / the AI Snake Oil book with others who might benefit.

Comments are closed.