Collaborating on an existing DSO/DVC Project#

If a DSO project is already set up and you want to collaborate or extend it, follow these guidelines. For general information about DSO or DVC, please check out the DSO getting-started page and the DVC documentation.

Requirements#

  • The original project was initialized with dso init and contains all essential files.

  • A DVC remote storage is set up where all DVC-controlled data is stored.

  • A Git repository exists and can be cloned.

How-To#

Clone the Git Repository#

First, clone the existing git repository of the project.

git clone <git_repository>

Pull the Data from the Remote Repository#

After cloning, no DVC-controlled input or output data is locally available. Therefore, it is required to pull the data associated with the repository from the DVC remote storage. Use:

# Compiles all params.in.yaml into params.yaml files and pulls the DVC-controlled data
dso pull

Make changes to DSO Project#

After pulling the source code from the git repository and the respective data from the DVC remote storage, everything is set-up to make changes and expand on the dso project. Please follow the instructions on how-to set-up folders, stages, or configuration files described in the dso getting-started page.

Fixing merge conflicts in dvc.lock files#

When merging branches it is expected that dvc.lock files are conflicting. There’s usually no point in trying to resolve conflicts, as dso repro will anyway have to be run after a successful merge, regenerating the respective lockfiles. With the following bash onliner, you can remove all offending dvc.lock files. You can then conclude the merge and run dso repro.

# remove all conflicting lock files
git diff --name-only --diff-filter=U | grep 'dvc\.lock$' | xargs git rm

# check if all conficts have been resolved
git status

# conclude merge
git commit

# rerun merged analysis stages
dso repro