Guidance for creating pull requests
This section details how we collaborate using pull requests.
What is a Pull Request?
- A Pull Request (PR) is a commonly used feature in platforms like GitHub and Bitbucket. It is a way for developers to propose changes to a codebase and submit them for review before merging them into the dev/main codebase.
- It all starts with opening an issue on a repo. Then, you create a branch to fix it. You finally open a PR when your code is ready to be reviewed and before it’s merged to dev/main.
Your first pull request within a repo
Most data science projects at Nesta should follow Nesta’s data science Cookiecutter structure. Hence, your first PR within a project repo should be to set it up - you can follow instructions here on how to do it.
When to make a pull request
- Whenever a new feature, utility function, model or specific analysis is complete, or a specific issue in the code is fixed, you should open a PR and request a code review;
- PRs should be reasonably sized (not your whole project analysis) and they should tackle a small number of related issues, ideally one issue per PR; Small PRs help you and your reviewer:
- You’ll get a better code review if your PR is reasonably sized: If your PR is too big or there’s too much to review, your reviewer might not be able to do it well/ thoroughly;
- Not overwhelming your reviewer: we all have a lot of tasks to do and it can be overwhelming to be asked to review a “whole-project” PR. If you have a bigger PR, make sure to communicate that to your reviewer in advance.
Project worflow examples and when to PR
Example 1: Data collection, processing and analysis
Example 2: Creating a model within an existing project
Worflow: from issue to PR
Creating the PR
- You can create the PR (step 5 in previous diagram) once you’ve finished tackling the issue and your code is ready for review;
- Alternatively, you can create a Draft PR after pushing the first changes to the correct branch. Draft PRs increase transparency by making your “work in progress” visible to others, allowing for early feedback from your project team members and making it easier handing over of work in progress.
- If the project follows Nesta’s data science Cookiecutter, the PR summary/description should come up with a template text (examples in following slides). You should fill this out.
- If that’s not the case, make sure you identify/highlight:
- the issue/issues being resolved,
- the changes being made in the PR,
- what you need from your reviewer.
- Don’t forget to tag your reviewer(s) in the PR, after double checking their availability;
- In case of a Draft PR, make sure to tag it as “Ready to Review” once your code is ready to be reviewed;
Before you put your code out for review
- Read through the “Files Changed” tab of your PR from top-to-toe before submitting, making sure that it is clean, legible and well documented;
- Make sure there are no local dependencies (e.g. you’re not reading from a local file, instead of S3) or that there are clear instructions for how to create them (e.g. reading a secret API key), so that the code can be run by another person;
- If your PR code takes a long time to run on a large volume of data, add a test mode to scripts that runs on fewer data;
- Test your code before asking others to review it (make sure it runs and that the outputs make sense). Consider adding unit tests, as reviewing tested code is generally easier as you already have an idea of what inputs/outputs look like;
- What do you need from your reviewer? Make sure to write that down in your PR summary. Do you need them to...
- ...make sure the logic is correct in a particularly complicated script?
- ...double check if you’re following best practices?
- ...run specific scripts?
- ...test something?
- ...help with improving performance?
During and after the code review
- Address suggestions and make the necessary changes.
- However, even when a suggestion is good, if it’s out of the scope of the initial PR or if it’s going to delay merging the initial PR, it might be better to tackle it in a different PR. For example, it if requires creating a new feature in the dataset or experimenting with a different model/algorithm;
- Request a re-review if needed and do this as many times as needed!
- Don’t feel disheartened if you need to go through several iterations and some of your code needs to be re-written. Tip: Small PRs help speed up this process.
- At the end, you may want to re-run the code in 'production' mode. e.g. if you have a data pipeline and you've merged a new feature, this will then need to be run to update the dataset;
- Always test/run your code one last time before merging the PR!
Examples of PRs and PR summaries/descriptions
There is no need to over-complicate PR summaries/descriptions. Most of the time, they can be very simple and concise!