We extracted data from 21,403 articles on the English Wikinews site. As of September 30, 2022, that's roughly 98% of all of Wikinews's English articles. The data has been placed in JSON file and can be downloaded from this page.
The JSON file contains an array, and each item in the array contains data for a particular article. Each items follows the structure shown below.
{
"title": "Example title",
"text": "Example article body\n\nParagraphs are separated by two newline characters.",
"date": "yyyy-mm-dd",
"categories": ["example_categories", "are all", "lower case"]
}Extracting information from "unstructured data" doesn't always work perfectly. In some cases, data points are altogether missing from the source. However, for the vast majority of all articles, a datapoints were obtained for all fields. Below is a rough overview.
licensed: Creative Commons Attribution 2.5 License
source: Wikimedia Dumps
author: various
licensed: Attribution-Share Alike 3.0 Unported License
source: Wikimedia Commons
author: Odder