Skip to content

Episode overview

Overview of released episodes content, see issues labeled episode for planned episodes.

1. Metaflow basics - fetching auxillary data from the web

This episode walks through the basics of using Metaflow idiomatically - i.e. in a way that suits DAP and its use-cases - by fetching three auxilliary datasets needed:

  • Companies House lookups to get the SIC code and address for each company number.
  • National Statistics Postcode Lookup (NSPL) which will allow us to identify the Local Authority District (LADs) a company belongs to by matching its postcode.
  • SIC taxonomy lookup between names and codes

In addition, the first set of content (later episodes add more advanced content) is added to a Metaflow guide.

Important notes

There are missing pieces to this episode that a data-science PR should have. The most obvious of these is the absence of tests which are the subject of episode 3. Tests for the three pipelines of this episode will be added in the testing episode in order to keep the content of this episode focused around writing basic Metaflow flows and getters.

Besides the episode guide there is neither documentation of how to run the flows or version-controlled configuration for flow parameters. This will be addressed in episode 4.

Key files

Back to top