Setting up a Scientific Python Project from Scratch
This guide is intended as a resource for students and researchers who want to set up a scientific Python project from scratch. It will cover the basic steps to create a project structure that follows good coding practices, including version control, managing Python versions and virtual environments. It is by no means a comprehensive guide, but rather a starting point for your own projects.
It is mainly tested on Windows 10 and Linux. I aim to keep it updated with state-of-the-art tools and practices, but it might not always be up to date. If you find any issues or have suggestions for improvements, please feel free to contact me.
Installations
- Install git
- Install VSCode
- Install pyenv
Accounts
- GitHub (connect your machine to GitHub via SSH Connecting to GitHub with SSH - GitHub Docs)
Workflow
-
Global Python setup
We will differentiate between the global Python setup, used to install basic tools like
cookiecutter
andpoetry
, but not for the project itself, and the project setup, which will be done in a virtual environment.# get a list of all available Python versions pyenv install --list # choose the version you want to install, e.g. 3.11.4. I would suggest using the latest stable version pyenv install 3.11.4 # set the global Python version pyenv global 3.11.4
Now, we can install
poetry
to manage our virtual environments and dependencies, andcookiecutter
to create pre-defined project structures.# install poetry pip install poetry # configure poetry to create virtual environments within the project folder poetry config virtualenvs.in-project true # install cookiecutter pip install cookiecutter
-
Project setup
Now we can create a new project using
cookiecutter
. I like to use lightweight templates, such as the one I created. This will create a project structure that follows good coding practices.cookiecutter gh:RDoerfel/simple-cookiecutter
Go through the initialization steps and create the project skeleton. cd into the project directory after its creation. You might as well use the
gh:patrickmineault/true-neutral-cookiecutter
template, or for larger projects,gh:CBS-HPC/replication_package
. -
Local Python setup
Once you have created the project structure with cookiecutter, we are ready to set up our local Python environment. First, install the Python version you want to use for the project.
# Again, have a look at all available Python versions and choose the one you need pyenv install --list # install the version you want to use, e.g. 3.11.4 pyenv install 3.11.4 # set the local Python version for the project pyenv local 3.11.4
Now we can use
poetry
to manage our virtual environment and dependencies. Poetry will create a virtual environment within the project folder, which is a good practice to keep the project isolated from other projects and the global Python environment.# tell poetry to use the local Python version poetry env use $(pyenv which python) # initialize the poetry project poetry init # install the project poetry install # install packages poetry add numpy # this is how to install packages within the project
-
Version Control
Initialize version control and connect it to GitHub. We first need to create a new repository on GitHub. Once you have done that, we can initialize git in the project directory and connect it to the remote repository.
git init git remote add origin <your_github_repo>
Open the
.gitignore
file. This is a file that excludes other files or directories from version control. We use this for data and environments, etc..venv/ poetry.lock data/
Once all the unwanted files and directories are excluded, stage everything, commit, and push as initial commit.
git add --all git commit -m "initialize project" git push origin main