The Digital Data Desk at The Canadian Press started with funding from the Google News Initiative to advance data journalism projects and help the media industry thrive in the digital age.
The team used automation and Natural Language Processing to create content from data by finding patterns and determining what news stories can be told about them.
Using public data sets from all levels of government, research institutes and academia, the team:
The Labour Force Survey is a monthly release of data about changes in the Canadian job market – from the unemployment rate to an estimate of the number of jobs lost/gained in a particular sector. Two tools were built to automatically generate stories based on the data from this release:
The birthday generator automatically updates celebrity birthdays that are distributed to media and other clients via the CP wire. A database was created that automates the stories each day, saving the CP broadcast team time and effort from manually updating the celebrity birthdays manually. The stories include the celebrity’s name, occupation, date of birth, and date of death (if applicable).
Through an Access to Information and Privacy request, CP obtained data for the Canada Emergency Response Benefit (CERB) payments based on forward sortation areas (FSA). The FSA was matched with Canadian cities based on information available through Canada Post. Then, the FSA, city and CERB data were merged to see how many people in each city received CERB each of the seven, four-week, periods it was available. The results were analyzed and featured in a CP article. A script was written to automatically create a story for each city based on the compiled data.
The Consumer Price Index is released monthly and is used to get an indication of how consumer prices are changing - it is usually referred to as the inflation rate. Two tools were built to automatically generate stories based on data from this release:
CP’s clients get a list of events every week that outline important things that have happened on every day in history and in music history. A database was created from the thousands of events, making it easier to search and update the events, and automate the stories based on new and updated entries. The user-friendly database is simple to navigate and saves journalists from having to manually maintain the entries.
The New Housing Price Index is released monthly and is used to track price changes in new homes built across the country. A tool was built that automatically creates three lists:
A tool was created that integrates with Slack and writes profiles on federal ridings based on 2016 census data from Statistics Canada and previous election results. The Election Slack Bot makes it easier for reporters to do research for election stories by:
The Twitter bot tweets about elections before and during election night in both English and French. Powered by an API from the election consortium, the bot tweets every two hours to count down the election. When the election results start coming in, the bot tweets every 90 seconds with election results and links to election maps on client sites. If you tweet at the bot about election results, it will reply with CP’s election results map in use by one of our clients.
Riding previews were created for distribution on the CP wire a few weeks before the BC election in 2020. A story was automatically generated for each of the 87 electoral districts, which included information from the previous election such as the person who won, the results for each candidate and the voter turnout.
A tool was created to free up time for CP reporters and allow them to cover more in-depth election night stories. When an electoral district was called, a story with the results would generate automatically. The story was automatically published from CP’s Content Management System, so nobody had to manually input the stories into our CMS. The stories were also updated at the end of the night with more current vote numbers.
Constantly evolving as more data becomes available, the tool currently generates stories in both French and English, with detailed regional breakdowns of several relevant metrics (from the number of new cases, to rates of infection and the number of vaccines distributed), along with several accompanying charts for both print and online use.
Managing Director, Pagemasters North America
Head of Digital Data Desk
Response in 24 hours (Monday-Friday)