How I Built the News Archives API

The News Archives API is a super simple application programming interface that allows anyone to get news of the past.

Entire rooms of newspapers now fit in a laptop

What problem does this solve?

In my house, many newspapers of historical events are stuffed into my garage cabinets, like papers from 9/11, presidential elections of 2008, ’12, and ’16, the first quarantines from covid-19, and much more. These are stored so that in, say, ten years, I can see what was the news on this very important day. Maybe even in hundreds of years, someone will come across my former garage and get a primary source of world-changing events from my time (unlikely).

  • I can store all articles, instead of just a few.
  • You can look for articles containing a keyword in their title.
  • News can be logged for the public, instead of just me.
  • Anyone can embed this into their own application.

Architecture

Two components make up the system. A news “logger” to save articles, and a server to output the data for the public.

Logging articles

At a specific time of day, 24 top trending articles from a news source are saved to the cloud. Articles’ titles, descriptions, and links are stored.

  • A news source

Server

When a get request is made to the server, news should be returned.

Tools used

  • For a reliable world news source, I used BBC News.
  • For a document-oriented, low latency database, I used Rockset.
  • For a simple web framework, I used Flask.
  • For a GitHub compatible server hosting service, I used Heroku.
  • To log news 24/7, I used an AWS EC2 instance.
  • To host and open-source code, I used GitHub.

Building the service

Logging articles

At 10 am GMT, news is obtained from BBC News’ RSS feed using my python package bbc feeds and is pushed to a Rockset data collection. Each day has its own document, containing document fields “_id”, which has the news articles’ date, and “articles”, which contains a list of json objects containing articles’ title, description, and link.

{
“title”: “Covid: First round of US vaccinations to begin on Monday”,
“description”: “The Pfizer/BioNTech vaccine was approved earlier this week and doses are being distributed this weekend.”,
“link”: “https://www.bbc.co.uk/news/world-us-canada-55289726", }

Client Requests

Getting a day’s data

curl https://newsarchives.herokuapp.com/api/v2/day/<date>
SELECT articles
FROM commons.NewsArchives
WHERE _id=:date
curl https://newsarchives.herokuapp.com/api/v2/month/<month>
curl https://newsarchives.herokuapp.com/api/v2/year/<year>
SELECT articles
FROM commons.NewsArchives
WHERE _id LIKE CONCAT(:time, '%')
curl https://newsarchives.herokuapp.com/api/v2/keyword/china?month=2020-11
curl https://newsarchives.herokuapp.com/api/v2/keyword/china?month=2020-11&limit=5
curl https://newsarchives.herokuapp.com/api/v2/keyword/china?limit=5
curl https://newsarchives.herokuapp.com/api/v2/keyword/china
SELECT n._id, models.a.description, models.a.link, models.a.title
FROM commons.NewsArchives n,
UNNEST (n.articles as a) AS models
WHERE LOWER(models.a.title) LIKE LOWER(CONCAT(‘%’, :keyword, ‘%’))
ORDER BY _id DESC
LIMIT :limit
WHERE LOWER(models.a.title) LIKE LOWER(CONCAT('%', :keyword, '%')) AND n._id LIKE CONCAT(:time, '%')

Client libraries

Developers can use client libraries to access the API easily in their favorite languages. Here are the “official” client libraries:

Demo

A demo application can be found here. Here’s a demo of the demo:

from flask import render_template
import newsarchives
def day_search(day):
return(render_template('demo_search.jinja', news=newsarchives.day(day)['data'][0]['articles']))
def keyword_search(keyword):
return(render_template(‘demo_search.jinja’, news=newsarchives.keyword(keyword, limit=20)[‘data’]))

Contributing

The source code can be found here. Contributions and suggestions are greatly appreciated. Open pull requests here and suggestions/bugs here.

Conclusion

And that’s a wrap! Please feel free to try out the API yourself.

An 8th grader working on all sorts of misc. software

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store