ChatGPT in the Newsroom: Is AI Coming for Journalism?

Gasp! The call is coming from inside the house! Large Language Models may be useful to reporters and editors, but they are not to be trusted by themselves.

BY CHRIS NEKLASON PUBLISHED AUG 15, 2023 3:05 A.M.

Share this:

Gulp! We welcome our robot overlords. DALL-E contributed

The world of journalism has been confronted by the sudden appearance of the so-called Large Language Model (LLM), a specific type of Artificial Intelligence platform, of which ChatGPT is the most famous example. Since news of the technology exploded last year, newsrooms everywhere have been racing to experiment with these platforms to see how they might affect the reporting of news.

LLMs are systems that scan and process a large amount of written material in a “training” cycle, during which they index and catalog word combinations and place them in context using a technique known as In-Context Learning. They can then configure the word combinations into a “format” (an essay, a pitch deck, a news article, a radio script, etc.) and a “style” (formal business, light and breezy, “as if written by Joan Didion,” etc.).

Like other newsrooms, we’ve been experimenting with the technology and considering how it might be applied to our workflow.

Introducing ChatGPT

We’ve been spending some time using OpenAI’s ChatGPT, trying to understand how it works and determine how—or whether— to fit it in our workflow as we build and maintain the information stored in our internal database.

ChatGPT (for Generative Pre-trained Transformer) has been described in apocalyptic terms, but in its current fourth generation, its functionality is more of a glorified grammar-checking auto-complete, such as what Google Docs is (annoyingly) doing as I type this text.

One can interact with it using a chat-style web interface or an Application Programming Interface (API), so we tried both.

It should be noted that ChatGPT is not connected to the Internet, thus it can’t use a search engine to look up the answer to a submitted question. It can only work from its training data, which is a tiny subset of information scraped from the Internet. It can also be made to work with information submitted to it by a user.

Experiments With the Chat Interface

One begins by getting an account at OpenAI (there’s a free version). Once signed up and signed in, a page is displayed with a chat text box at the bottom, into which one enters a “prompt” or command.

Because it’s a so-called “chatbot,” one can use the chat interface to have an iterative “conversation” with ChatGPT, and it is this usage which has generated the most concern.

But we’re not interested in chatting. We’re interested in summarizing information.

In compiling our database of public agencies and officials, non-profit community groups, and the rest of the data points that comprise the civic infrastructure of every community, we write a lot of blurbs and summaries. So we did some blurb-writing experiments. First, we copied the contents of an article into a browser clipboard, entered a prompt of “write a two-paragraph summary of the following article,” pasted the content from the clipboard and submitted it to ChatGPT.

Within a couple seconds, it responded with a good summary of the submitted articles.

We also experimented with different formats of information summarization by using different prompts:

“Write a three-sentence informational summary of the following article: …“
“Write a compelling three-sentence summary including the author’s name designed to entice the reader to read the following original article:... “
“Write a compelling summary of the mission, impact and history of the community group described in the following information: …“

And of course, we went there:

“Write a three paragraph summary in the style of Hunter S. Thompson designed to entice the reader to read the following original article: …”

ChatGPT is good at producing summaries of information supplied with the prompt in the requested formats. It’s only fair at reproducing the voice or style of an author.

Garbage In, Garbage Out

We ran another set of experiments with prompts for ChatGPT to summarize information contained within its internal data set, which was scraped from the Internet and was current as of September 2021.

For these tests, we asked ChatGPT to summarize information about community groups:

Summarize in three paragraphs the mission and history of the Community Foundation of Santa Cruz County.
Summarize in three paragraphs the mission and history of Actors’ Theatre in Santa Cruz California.
Summarize in three paragraphs the mission and history of Community Bridges of Santa Cruz California.

The results were much less impressive. In only two of the three tests did ChatGPT acknowledge that its data was limited, but in the case of Actors’ Theatre, it spouted three paragraphs of generic bullshit while not mentioning the annual “8 Tens @ 8” short play festival for which the troupe is locally renowned.

The output is only as good as the input. Or as computer scientists in the 1950s used to say: “Garbage in, garbage out.”

Experiments With the API

As noted, OpenAI also offers access to its Application Programming Interface (API), which allows two software programs to communicate. We set up a paid account with OpenAI which enabled us to write code that allowed us to connect with the ChatGPT API over the Internet, submit a prompt, and receive and save the reply into our database.

Specific use cases of interest for us are automating the production of first draft summaries of informative descriptions of nonprofits and community groups using the content of their “About Us” pages, or automating the compilation of meeting calendars from the myriad government calendars we collect in our database.

Early results are promising.

It should be noted that the API costs money to use, and the rates are set based on the number of “tokens” in the prompt, which is input, and the results, which are output.

As explained on their website, Open AI offers: “Multiple models, each with different capabilities and price points. Prices are per 1,000 tokens. You can think of tokens as pieces of words, where 1,000 tokens is about 750 words. This paragraph is 35 tokens.”

Google Wants In

Google is working to incorporate their AI into a product which might be useful to journalism in much the same scenarios outlined in our experiments with ChatGPT. Google calls the product Genesis; a former-journalist friend, who is familiar with that book in the Bible, suggests it would be better named “Revelations” (see Apocalypse).”

Conclusions

Stipulated that nothing in a legitimate news outlet should be published before it’s vetted by an editor or two, Large Language Model (LLM)-based platforms like ChatGPT and Google Genesis can be useful for some aspects of journalism, especially those that consist of summarizing information.

Examples include producing a summary of a city council meeting from the official minutes, or writing a one-paragraph summary of an article either for publication at the head of the article or as a news digest on another site linking to the original article.

We already do a lot of the kinds of summarizing at which the LLMs reliably excel (when fed the accurate information to summarize), so we’re continuing our experiments and internal discussions. Like most of the other newsrooms that have already played with the tech, we’ll almost certainly deploy it in our workflow and product, freeing our humans for the other tasks at which they excel.

So, Is AI Coming For Journalism?

Yes and no.

It can be a useful tool when utilized in the service of good journalism, as reported in a number of stories floating around in the journalism-about-journalism trade press over the last several months, and judging from our own experience.

But it’s also become a driver of the enshittification of online journalism as corporate front offices foist it on journalists and publications (see recent debacles at CNET and IO9) and as search engine results are diluted with AI-generated garbage.

These negative use-cases erode trust in journalism as more people are exposed to garbage content, and the value of Internet search engines for the discovery of good journalism is degraded by SEO-optimized AI-generated cruft stuffing search results.

We’re fighting that by establishing California Local as a curator and creator of good local journalism and building a statewide brand based on information quality, utility and trust. We’re leveraging our brand to facilitate the discovery and connection to the solid journalism of the local newsrooms participating in our media alliance. And we’re leveraging our decades of in-house technical and editorial expertise to experiment with new technologies and incorporate them into our workflow when it makes our product better and when it makes sense.

Blog

Articles in which we try to explain ourselves.