On git-hook managers
Using git-hooks is a common method used to enforce constraints, standards, and static quality of a code bases. By checking for things at the point of the "pre-commit" git hook it allows for a faster feedback loop for the developer, rather than waiting for it to fail in CI.
It is common to use a hook manager to automate the installation of the hooks
into .git/hooks
and to pull them from either a remote repository of common
hook implementations or generate them from a simple kind of configuration.
This page (githooks.com) provides a nice third-party overview of git hooks along with the most commonly used hook managers.
I was a little late in adopting git-hooks myself and have finally sat down and figured out a good workflow for my tastes.
At least in my little bubble the most common hook manager is pre-commit. I've created repository templates that use it, bootstrap it, and integrate it with CI such that it is by default reproduced locally to avoid version drifts 1.
The Problem with pre-commit
I found pre-commit
to be pretty frustrating most of the time. When I imagined
a hook manager I expected that it would be a pretty simple piece of software
that simply took some commands to run, or potentially downloaded some scripts
from a git repo. pre-commit
on the other hand is much more complicated and
actually takes on the responsibility for downloading the hook scripts from a
special "package" format, creating individual virtual environments for them, and
a host of other options for controlling their behavior. All of which can break
in non-obvious and confusing ways 2.
I lived with the inconveniences for a while as it seemed that everyone around me
was happily using it and I must have been missing something. Unfortunately, I
don't think this to be true anymore and I'm convinced that pre-commit
is just
adopted in cargo cult fashion. Perhaps there is even a conflation that
pre-commit
is git hooks.
Regardless the straw that broke this camel's back was setting up Python type
checking to be run as a git hook. Because pre-commit
hooks are run in their
own contained virtual environment they cannot easily import either local code
you are working on (for type stubs) and if you want to inject third-party type
stubs you need to explicitly write them all down in the pre-commit
configuration. The issue is that I already have all this done using other tools
or scripts which are the primary entrypoints for working on the project. I don't
want to have to maintain a duplicate listing just for my hooks to work. Managing
dependencies is hard enough without having another place to duplicate them.
This is really an issue with separation of concerns and composability.
Personally, I take great care and choose my tools such that I can generate
reproducible development environments such that they are composable with many
other tools. pre-commit
is in opposition to this and wants to control
everything. This is useful for simple static analysis tools like black
which
do not need to be installed alongside your code and can act on your code as
simple text blobs. However, when you want to do more complex things that require
pulling in extra dependencies and actually importing builds of your code
pre-commit
just ceases to be simple and is in your way.
Secondly, because pre-commit
thinks it runs the show it ends up infecting the
rest of your development automation. I typically give all my projects some kind
of "task runner" which gives a high-level and abstracted entrypoint to doing
repetitive tasks. This typically takes the form of a Makefile
, but I've also
used and can recommend invoke and
pydoit to accomplish the same thing.
Here is an example of a Makefile
with some common tasks without using
pre-commit
:
clean: ## Clean temporary files, directories etc. hatch clean rm -rf dist .pytest_cache .coverage find . -type f -name "*.pyc" -print -delete .PHONY: clean format: ## Run source code formatters manually. hatch run -- black src tests hatch run -- isort src tests .PHONY: docstrings validate: format-check lint docstring typecheck ## Run all linters, type checks, static analysis, etc. .PHONY: validate format-check: ## Run code formatting checks hatch run -- black --check src tests hatch run -- isort --check src tests .PHONY: format-check lint: ## Run only the linters (non-autoformatters). hatch run -- flake8 src tests .PHONY: lint docstring-check: ## Run docstring coverage only. hatch run -- interrogate src tests .PHONY: docstring typecheck: ## Run only the type checker (requires mypy) hatch run -- mypy --strict src tests/utils .PHONY: typecheck
In this scenario make validate
is the canonical way to run all the various
linters. This should be the only entrypoint to these tasks so that we don't get
duplication and drift in different environments, e.g. locally, CI, and
git-hooks.
Using pre-commit
however I cannot call these Makefile
tasks in any practical
means. So what ends up happening is that you need to replace commands in the
tasks with explicit calls to pre-commit
to run things for you. E.g.:
.pre-commit-config.yaml
repos: - repo: https://github.com/psf/black rev: 22.6.0 hooks: # run the formatting - id: black alias: black-format files: ^(src|tests) name: "Format: black" # just check - id: black alias: black-check args: [--check] files: ^(src|tests) name: "Check: black" # don't run unless done manually stages: - manual
Notice the rigamarole you have to go through to run a single hook type in multiple ways.
Makefile
format: ## Run source code formatters manually. pre-commit run --all-files black-format pre-commit run --all-files isort-format .PHONY: docstrings
So now pre-commit
is not just the thing that runs things on your behalf at
specific time (the essence of what git hooks are) it is now an integral part of
how you manage dependencies for your project.
The Solutions
At this point I decided I needed something better. I want a hook manager to be:
- Very easy to install and bootstrap for newcomers to a project.
- Ability to run arbitrary commands and integrate with any virtual environment manager and task runner I'm already using.
For 1 this is very important because you do not want to frustrate people before they actually start working on your code. At this stage it should be so easy that you should be able to write a task or shell script that can bootstrap the manager. So ideally that means a single executable file.
I narrowed it down to two hook managers that seemed to accomplish this:
I ultimately decided on lefthook
but Autohook
was very tempting as well.
Autohook
is a single bash
script that you can just download. You operate it
by maintaining a hooks
directory that you dump executables into, and control
which hook stage and order they run in with symlinks. All Autohook
does is
place these into .git/hooks
. I don't think it gets any easier than that.
Looking at the code as well its just a bit more robust version of the shell
script you would have written yourself without a plethora of hook managers to
choose from.
What I liked about lefthook
was that its a Go project and comes as a single
binary executable which is easy to download and immediately use. It should also
be easier to support on platforms like Windows since there is no reliance on a
POSIX-like shell. It also provides packages for just about every package
manager, and even for language specific ones like pip
and npm
.
Second, its dead simple in its operation. You write a lefthook.yml
file which
specifies the hooks you want to use. Here is mine:
lefthook.yml
pre-commit: parallel: true commands: format-check: run: make format-check lint: run: make lint docstrings: run: make docstring-check typecheck: run: make typecheck
The top-level groups are for each hook stage and here I am only configuring the
"pre-commit" stage. Under the commands
section I list out the name of each
hook and how to run it. Since I already have the business logic for these in my
task runner its as simple as wiring them up!
This is great, its a hook manager thats doing only one thing, managing hooks, and doing it well. It didn't require any refactoring of anything else in my project, it runs the hooks in parallel and provides a bunch of other useful options to control its behavior.
Conclusion
Thats it! I've not even used lefthook for a full day and its already much improved my life:
- I now can easily do complex typechecking with a git pre-commit hook,
- I don't need to run my hook manager in CI, just my task runner,
- I don't need to worry about yet-another system with caches and virtualenvs,
- I can run hooks more quickly and in parallel
I highly recommend it and encourage you to question cargo cult adoption of
sub-par tooling like pre-commit
in other areas.
-
A lot of times I see impatient developer's just ignore these kinds of issues and get by just fine for a long while. I have the unique (mis)fortune of typically running into these kinds of issues very quickly if not immediately and this was the same for using
pre-commit
in my CI pipelines. So thats just to say that, yes you really do need to worry about this as it will bite you sooner or later. ↩