In what was admittedly a small-scale study, Clerwall asked 46 undergraduate students in media and communications to read one of two NFL game recaps, one produced by generative software and posted on a fan site, the other by an
LA Timesjournalist. The latter was shortened to match the length of the former, which was also trimmed a bit. When we asked Clerwall if this could tarnish the overall quality of a piece of journalism and affect the results, he said: "You are right, had I cut them differently maybe the results would have been different.
However, they were cut in a way that made sense (with the heading, intro and a few following paragraphs)."
The two groups read the articles as part of a web survey, then were asked to asses the article they read according to quality of the content and credibility. In total, the subjects had to assess how well one of 12 words described the news: objective, trustworthy, accurate, boring, interesting, pleasant to read, clear, informative, well written, useable, descriptive and coherent.
Clerwall used the Mann-Whitney test to rank the articles according to that feedback. It's a test used to evaluate two independent groups where the dependent variable is non-normally distributed. "I tried to base the measurements/criteria on previous research on quality assessment, but of course there might be others that could have been included, and that might have a different result,"
Clerwell tells us. When we ask whether a test like this would surely miss all the incremental things that make one article superior to another, he said yes, maybe. "It is a small study and one suggestion is to continue the research with more texts of various types and length -- and with a larger sample."
The current study has raised some interesting questions, and red flags, though.
The text penned by a real journalist scored highly on words like "well written" and "clear" and being pleasant to read.
Software-generated text, on the other had, scored highly for being descriptive, informative, more accurate, trustworthy, and objective. As Clerwell points out though, being descriptive could be either a good or bad thing, depending on the reader -- all readings will be subjective. But it's not looking good for journalism (or at least, anyone that writes brief game recaps). "Although the differences are small, the software-generated content can be said to score higher on descriptors, typically pertaining to the notion for credibility," says the paper. It did, however, also come out as more boring. So there's that, journalism.
Thankfully, Clerwall concludes in the article: "But are these differences significant? The short answer is, no they are not.
Using a Mann-Whitney test for a non-parametrical independent samples, only the descriptor Pleasant to read showed a statistically significant difference between the two treatments." "I was surprised by the results, to a certain extent. Even if I had found out that the automated texts were actually quite readable, I thought they would 'score' worse than the one written by a reporter."
Finally, all the subjects were asked whether they thought the article had been penned by a person, or a piece of software. Of the 27 people that read the software-generated game recap, ten thought a journalist wrote it. However of the 18 that read the LA Times piece, ten thought a piece of software generate it.
At the start of his paper, Clerwall paints a picture. It's of a road traffic accident, involving a car that is equipped with collision detection technology and GPS. It automatically sends the appropriate data to the authorities and a news gathering service gets access, producing a short article in seconds that is distributed and bought by papers. We are, we might assume, a long way off such a scenario. But Clerwall says "given the technologies available I don't think it's to farfetched". Even the related privacy issues, he believes, will most likely not be of consequence considering "a lot of people are willing to give up quite a lot of privacy for the sense of feeling secure".
It is possible that this kind of generative software will only ever be able to produce the very generic kind of news reports the public could get used to reading about when it comes to game recaps, or other equally fact-based articles in a very specific context and genre. "It might be just good enough. I myself am quite into hockey, and for me a quick game recap with all the data/info I want in a rather readable way would suffice. However, perhaps I would not 'trust' the algorithms to be able to give me a more opinion based, insightful chronicle about my favourite team."
When we point out a game recap is perhaps the easiest piece of content to generate -- it's also one of the examples cited in the
Wired article that inspired this experiment --
Clerwall responds: "You are right, they are quite formulaic and I think it is exactly because they are based on a lots of facts that they 'work' fairly well. However, when it comes to implication for the future of journalism we should keep in mind that the algorithms are getting better and better, and my guess is that we will see other types of journalistic texts in a not so distant future...
Since they seem to rely on quite a large amount of data, maybe it is plausible that they could generate texts/reports on anything (if tweaked the right way) that is based on large sets of data."
For now, Clerwall believes there's enough fodder here and enough questions raised about the future of journlalism, to suffice deeper investigation. And while he won't get drawn into predicting that future too far in the distance, he did say this: "I am quite certain that in the not so distant future, more journalistic content (even more elaborate than the once we see today) will most likely be produced by computers (or algorithms)".