Skip to content

Introduction to “Good” and “Bad” Python Code Practices

Now that you have a sense of python environments and style guidelines, let’s talk about “good” and “bad” code. We’re using these terms for the purpose of simplicity and not because Python code can be described that reductively. This section walks through some good practices, things to avoid and an example of two functions that do the same thing, one written badly and one written well.

Good things

Code that allows for collaboration, readability, maintainability, consistency and scalability makes for good python code. Some examples of how to make your code “better” include:

  • When in doubt, opt for functional programming. Functional programming offers advantages such as immutability and pure functions, which lead to code that is easier to reason about, test, and debug and make it well suited for parallel programming. The emphasis on higher-order functions and function composition enables more flexible and reusable code, enhancing modularity and maintainability.
  • Explain, explain, explain! If someone else needs to understand your code (and indeed if you want to remember what you did!), you should explain everything. This includes writing detailed docstrings, where you summarise what the function does, what parameters the function takes, the parameter types (i.e. typing hints), and what it should return. You should describe what the .py file is doing at the top of the script. You should write detailed README.mds describing what each .py file does and (if relevant) how to run it. Long story short, explain, explain, explain!
  • Unit tests for code that needs to be particularly robust. If there is code that needs to be particularly robust i.e. you’re building a library that others will use or you will be running data pipelines regularly, you need to include tests using pytest. Bonus points if you also use github actions to run the unit tests every time you merge into dev.
  • Package versions in requirements. Python’s achilles heel is library dependencies. To manage this, add package versions in your requirements.txt file so a collaborator knows the library versions you used when developing a codebase. Also include somewhere the version of python you used to run your code, as this can also cause headaches.
  • Modularity and Code Organization. Write code that is modular and well-organised. Break down complex tasks into smaller functions with clear responsibilities. Use meaningful function and variable names that accurately describe their purpose. Organise code into logical modules.

Things to avoid

Although we don’t want to be too prescriptive as to what makes bad python code, there are certainly things to avoid. Here is a non-exhaustive list of things that make collaboration, readability, maintainability, consistency and scalability much harder.

NOTE: We’ve ordered them from most to least important.

  • Unreadable or very slow code: If someone else cannot follow what you have written (i.e. no docstrings, unhelpful variable/function names etc.), collaboration is next to impossible. If your code is extremely slow, ask yourself why and if you can’t solve it, write a detailed PR, pointing the reviewer to the part of your code that is taking too long. Often, relying on libraries like pandas for tasks that don’t need to be in a dataframe, can slow things down.
  • “It works on my machine” mentality: If your code is not reproducible on another machine, it will be extremely difficult for someone else to contribute to the codebase. Some examples of “it works on my machine” bad practices include loading data using your local machine’s absolute paths, not updating requirements.txt for new libraries that are required to run a script and ignoring environment configurations.
  • The brevity vs. readability trade off: if something can be done in 1 line of code instead of 10, opt for brevity. However, if something can be done in 1 line of code but is more challenging to understand (i.e. nested list comprehensions), opt for readability.
  • Two may keep a secret, if one of them is dead. Be thoughtful about how to deal with API keys and any other tokens or secrets that need to be used to run your python script. Never push keys to github.
  • Functions that do nothing but chain other functions. In effect, a function that acts as a pipeline. If you have a if __name__ = “__main__” section in a python script, lay out each step of your processing with relevant comments, rather than having previously defined a “main” or “run” function that does nothing but have a list of other functions. Your pipeline will be way more readable this way.

An example

To illustrate the above points in Good things and Things to avoid, here is an example of the sample function written badly vs. written well:

import requests
import pandas as pd

def funct():
    data = pd.read_csv('your/local/path/to/data.csv')
    data = data.groupby('category')['value'].sum().reset_index()

    api_key = "12345678"
    response = requests.get(f"https://api.example.com?key={api_key}")
    api_data = pd.DataFrame(response.json())

    merged_data = pd.merge(data, api_data, on='category')

    return merged_data

vs


import requests
import pandas as pd

def process_data(file_path: str, api_key: str) -> pd.DataFrame:
    """
    Process the data from a CSV file and merge it with API data.

    Args:
        file_path: The path to the CSV file.
        api_key: The API key for accessing data from the API.

    Returns:
        A pandas DataFrame containing the merged data.
    """
    # Load data from the provided file path
    data = pd.read_csv(file_path)

    # Process the data
    processed_data = data.groupby('category')['value'].sum().reset_index()

    # Make an API call using the provided API key
    api_url = f"https://api.example.com?key={api_key}"
    response = requests.get(api_url)

    api_data = pd.DataFrame(response.json())
    return pd.merge(processed_data, api_data, on='category')

Which function represents “bad” Python? Why is the first function badly written? How can you improve the “good” function?