50 Shades of Git: Remotes and Authentication

Introduction

Git is a software development tool that almost all engineers use in their work. This source control tool enables us to make changes to a project code base collaboratively. However, Git can be a headache at times. When running on CI environment, it sometimes does not work the way it does locally. Moreover, we sometimes follow best practices without knowing much about how it works. This gap together with the limited debug capabilities on CI make it even harder to resolve issues.

In this blog post, we are going to fill a bit of that gap. To be more specific, we are looking into how different ways of configuring a remote may affect the way Git authenticates with the server.

Background

A Git server refers to the server in which a repo is hosted. Those can be Github (github.com), Gitlab (gitlab.com), Bitbucket (bitbucket.org), or self-hosted server (ex. gitlab.company.com)

A remote in Git refers to a repo (hosted in a Git server) in which team members collaborate, ex. https://github.com/trinhngocthuyen/cocoapods-ezplugin.

A fetch action is to fetch changes (of branches or tags) from a remote. A push action is to transfer your local changes to a remote. These actions are done by the git fetch and git push commands respectively.

A typical workflow would be:

Conflicts may arise in steps (1) or (3). Engineers have to resolve them and sometimes try again.

Depending on preference, some may use git pull in their workflow in step (1). Under the hood, git pull is just git fetch followed up with git merge.

Configuring a Remote

A remote is denoted by a URL. This URL contains information about the transport protocol (SSH, HTTP/HTTPS, FTP…). Below are some valid examples:

ssh://[email protected]/trinhngocthuyen/cocoapods-ezplugin.git
[email protected]:trinhngocthuyen/cocoapods-ezplugin.git
https://github.com/trinhngocthuyen/cocoapods-ezplugin.git

To view remotes of a repo, run git remove -v.

$ git remote -v
origin  https://github.com/trinhngocthuyen/ezactions.git (fetch)
origin 	https://github.com/trinhngocthuyen/ezactions.git (push)

We can configure different URLs for the push action. This is done by running git remote set-url --push.

$ git remote set-url --push origin https://github/trinhngocthuyen/foo.git
$ git remote -v
origin  https://github.com/trinhngocthuyen/ezactions.git (fetch)
origin  https://github.com/trinhngocthuyen/foo.git (push)

Alternatively, we can use git config remote.<origin_name>.pushurl to alter the push URL.

$ git config remote.origin.pushurl https://github/trinhngocthuyen/bar.git
$ git remote -v
origin  https://github.com/trinhngocthuyen/ezactions.git (fetch)
origin  https://github.com/trinhngocthuyen/bar.git (push)

We can configure more than one remote per repo. This is usually the case for open-source projects where each engineer forks the repo. He/she pushes changes to his/her forked repo but still desires to keep his/her fork up to date with the main repo. This case is also useful when you work with mirrors (for example, one public repo on Github/Gitlab, and one private repo on your company server). However, we shall not dive into details for that topic.

A fetch/push/clone is associated with a remote. Prior to this action, Git authenticates with the server (ex. Github) and then performs further steps if applicable. Therefore, the credentials used for authentication is adjacent to the remote configuration. Those credentials could be an SSH key, a tuple of username/password, or an access token. In the following section, we’ll look into how such credentials play role in the authentication.

Remotes and Credentials for Authentication

Authentication with a Git server when cloning/fetching from/pushing to a remote is similar. For convenience, we take the fetch action as a typical example. If you take a closer look at how Gitlab CI/CD or Github Actions implements their checkout, you should see the order like this:

SSH

Using SSH to connect with a Git server is a common practice. A remote used with SSH is like this:

[email protected]/trinhngocthuyen/cocoapods-ezplugin.git

When fetching such a remote, Git opens an SSH connection to the server under the hood. This is when the authentication jumps in. As you know, it requires a pair of public & private keys. The public key is added to the server (ex. Github). The private one is owned by the user and used for authentication. This key, in OpenSSH, is known as “Identity Key” and is located in a file called IdentifyFile. By default, the following files are used

~/.ssh/id_rsa,
~/.ssh/id_ecdsa
~/.ssh/id_ed25519
...

If you have them configured, you can test the connection by running: ssh -T git@<server>

$ ssh -T [email protected]
Hi trinhngocthuyen! You've successfully authenticated, but GitHub does not provide shell access.
$ ssh -T [email protected]
Welcome to GitLab, @trinhngocthuyen!

Using different keys for different servers

Some choose to use different keys for Github, Gitlab, or your company server.

~/.ssh/id_rsa_github
~/.ssh/id_rsa_gitlab
~/.ssh/id_rsa_company

To add a key to the authentication agent, use ssh-add:

$ ssh-add ~/.ssh/id_rsa_github
$ ssh -T [email protected]
Hi trinhngocthuyen! You've successfully authenticated, but GitHub does not provide shell access.

Manually loading keys like this has two downsides:

A more proper approach is to use the SSH config (located in ~/.ssh/config). This way, you can configure what key is used for what server.

Host github.com
	IdentityFile ~/.ssh/id_rsa_github
Host gitlab.com
	IdentityFile ~/.ssh/id_rsa_gitlab
Host gitlab.company.com
	IdentityFile ~/.ssh/id_rsa_company

Using different keys for different repos

This is usually the case for CI. When running on CI, you should be mindful of what to write outside of the project directory. For self-hosted runners, files you write outside of these directories might retain. This issue happens a lot for Shell (MacOS) runners.

Two main drawbacks when such files are not properly cleaned up are:

Therefore, a best practice is to stick to the project directory or any directory that is guaranteed to be cleaned up by the CI/CD infra.

Then, in this case, we can instruct git to use the key by the core.sshCommand config (see: reference):

$ git config core.sshCommand "ssh -o IdentitiesOnly=yes -i <path/to/key> -F /dev/null"

HTTP/HTTPS

There is no problem if the repo is public. The remote URL is just like the web URL to the repo. For convenience, let’s call this kind of URL “bare URL”.

https://github.com/trinhngocthuyen/public-repo

Now, we only care about how to fetch from a private repo.

Git authenticates with the server using a username & password, or a token. We can see a token as a username/password tuple where the password is the token and the username is just anything you want (ex. x-access-token, gitlab-token…). Therefore, we can treat these two roughly the same.

Using username/password in the remote URL

An HTTP/HTTPS remote that allows us to fetch successfully looks like this:

https://<username>:<password>@github.com/trinhngocthuyen/private-repo

This turns out to be the approach Gitlab CI/CD adopts. If you run git remote -v in a Gitlab job, you should see the URL as follows:

gitlab_remote.png

Using http.extraheader config

Github Checkout Action adopts a different approach. They use the http.extraheader config to carry the credentials for authentication. And the remote URL is just a bare URL.

https://github.com/trinhngocthuyen/private-repo

Below are the logs from the checkout step. Taking a closer look, we notice the command that sets up the authentication. The masked content *** is actually the base64 encoded string of x-access-token:<token> (see: src/git-auth-helper.ts#L57-L60).

github_extraheader.png

You can easily try out this approach on your local by:

$ git config http.extraheader "Authorization: Basic $(echo -n x-access-token:<TOKEN> | base64)"
$ git fetch https://github.com/trinhngocthuyen/private-repo

Note: If you’re using Bash to encode <username>:<password>, be careful with the trailing newlines. It should be echo -n <username>:<password> | base64 instead of echo <username>:<password> | base64.

In case you want to configure for Github only, then use http.https://github.com/.extraheader instead of http.extraheader.

$ git config http.https://github.com/.extraheader "Authorization: Basic <base64(username:password)>"

This approach also works for other servers (Gitlab, Bitbucket…) as long as they support basic authentication.

Username/password prompts

If you fetch a remote with a bare URL (without a username/password), Git prompts you to ask for a username and password. Let’s say, we input x-access-token for the username and the access token for the password. Then, it successfully fetches from this remote.

$ git fetch https://github.com/trinhngocthuyen/private-repo
Username for 'https://github.com': x-access-token
Password for 'https://[email protected]': my-token-goes-here
From https://github.com/trinhngocthuyen/private-repo
 * branch            HEAD       -> FETCH_HEAD

Let say, you are a MacOS user. Now, you fetch from this remote again. Then, you are able to perform the fetch without seeing the username/password prompts again.

$ git fetch https://github.com/trinhngocthuyen/private-repo
From https://github.com/trinhngocthuyen/private-repo
 * branch            HEAD       -> FETCH_HEAD

This behavior is due to the fact that Git caches the credentials. When enabling git traces by setting variable GIT_TRACE=1, you should see what handles the credentials cache.

$ GIT_TRACE=1 git fetch https://github.com/trinhngocthuyen/private-repo
09:22:03.977378 git.c:460               trace: built-in: git fetch https://github.com/trinhngocthuyen/private-repo
09:22:03.978347 run-command.c:655       trace: run_command: GIT_DIR=.git git remote-https https://github.com/trinhngocthuyen/private-repo https://github.com/trinhngocthuyen/private-repo
09:22:03.992273 git.c:750               trace: exec: git-remote-https https://github.com/trinhngocthuyen/private-repo https://github.com/trinhngocthuyen/private-repo
09:22:03.992846 run-command.c:655       trace: run_command: git-remote-https https://github.com/trinhngocthuyen/private-repo https://github.com/trinhngocthuyen/private-repo
09:22:04.464215 run-command.c:655       trace: run_command: 'git credential-osxkeychain get'
09:22:04.509220 git.c:750               trace: exec: git-credential-osxkeychain get
09:22:04.510059 run-command.c:655       trace: run_command: git-credential-osxkeychain get
09:22:04.993732 run-command.c:655       trace: run_command: 'git credential-osxkeychain store'
09:22:05.038985 git.c:750               trace: exec: git-credential-osxkeychain store
09:22:05.039730 run-command.c:655       trace: run_command: git-credential-osxkeychain store
09:22:05.506154 run-command.c:655       trace: run_command: git rev-list --objects --stdin --not --all --quiet --alternate-refs
From https://github.com/trinhngocthuyen/private-repo
 * branch            HEAD       -> FETCH_HEAD
09:22:05.547164 run-command.c:1524      run_processes_parallel: preparing to run up to 1 tasks
09:22:05.547195 run-command.c:1551      run_processes_parallel: done
09:22:05.547216 run-command.c:655       trace: run_command: git maintenance run --auto --no-quiet
09:22:05.565672 git.c:460               trace: built-in: git maintenance run --auto --no-quiet

It is git credential-osxkeychain that does the magic in MacOS. In the first successful fetch, the command git credential-osxkeychain store saves the credentials to Keychain. In subsequent uses, it runs git credential-osxkeychain get to retrieve the credentials for authentication.

git_keychain_cache.png

You can easily verify this by checking the corresponding item in Keychain Access, or by running git credential-osxkeychain get:

$ echo "host=github.com\nprotocol=https" | git credential-osxkeychain get
password=ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
username=x-access-token

Git credential storage

What happened in the precedent section is the credentials are handled by “credential storage”. In MacOS, Git comes with the osxkeychain mode which allows caching such info to Keychain.

If you also observe the same behavior (ie. Git remembers your credentials), then maybe you have the cache in place. To see the current credential storage:

$ git config credential.helper
osxkeychain

In fact, for me, osxkeychain is set as the credential storage by the system git config (located in /System/Volumes/Data/usr/local/etc/gitconfig)

$ git config --system --list
credential.helper=osxkeychain

There are several built-in options besides osxkeychain (see: reference):

You can try out these options by overriding the config:

$ git config credential.helper cache

Using url.<base>.insteadOf config

This config is really useful, especially for CI environment.

For Git-based dependencies in the project (declared in Gemfile, Podfile, etc.), engineers may choose to use SSH URLs because those work for them on their local. When running on CI environment, those URLs possibly won’t work if the CI provider does not use SSH for authentication (ex. Github, Gitlab). Changing those URLs to HTTP/HTTPS format, unfortunately, might cause the issue on their local.

A simple solution to mitigate this issue is using the url.<base>.insteadOf config. This way, a URL format can be translated into the expected one.

Using this config is a very common practice to make your CI executions robust. Therefore, sometimes you might see the code like this on CI:

# For Github Actions
$ git config --global url."https://x-access-token:${{ secrets.GITHUB_TOKEN }}@github.com/".insteadOf "[email protected]:"

# For Gitlab CI
$ git config --global url."https://gitlab-ci-token:${CI_JOB_TOKEN}@gitlab.com/".insteadOf "[email protected]:"

Conclusion

In this blog post, we covered some areas of how Git authenticates with the server. We also mentioned some best practices when working with SSH and HTTP/HTTPS remotes. Although some practices are not really the case for local development, they are quite common for CI integration. Given that different CI providers may adopt different approaches (ex. Github using the .extraheader config, Gitlab using token-based remotes, CircleCI using SSH), knowing how they work helps you be less confused with the workflows.

At the end of the day, good engineering quality comes from not only excelling at domain knowledge but also being proficient in your day-to-day tools, in my opinion.