Data Safety and Ethics
Data safety and ethics are an important part of how we use Python.
Note: This section is still being developed.
Guidelines for interacting with open source resources
Please consider the following before interacting with any open source resources, such as datasets, packages or models:
- Has it been created by a known/reputable source?
- Can you find information about the source by searching online?
- Is it frequently downloaded or starred, and can you see other organisations or individuals also using it that you recognise or think are trustworthy?
- If your data or project are particularly sensitive, you should check any model for sending outward traffic or consider blocking outward traffic. The Data Engineering team can help with blocking outward traffic (also known as airgapping) where risk is high and there is a clear rationale for pursuing the dataset or model in question.
- Are any of the files you are downloading .exe files? Running an exe file will usually initiate a program, which is risky.
In general, conda (and even pypi) installing will be more secure then installing straight from a github repository.
If you are still unsure, speak to a colleague, or a member of the Data Engineering team before doing anything.
Further reading: Using Node Package Manager