Recently, I have been working on a monorepo where we use Bazel as a build system. It’s a polyglot repository with several languages. Python is the one that is being used mostly for the backoffice and data work(internal tools, dashboards, data ETLs, etc.).
Developers using Python were far less numerous than those using Go(main production language), which led to a lack of standardization in the codebase. As the data team started to grow, I have decided that the time has come to introduce a proper automated code formatter and linter, as well as CI checks for it.
In this post, I will share how I have set up Ruff for Python in Bazel.
0. Why Ruff? #
Although my team was already using Black as a formatter, I thought giving Ruff a chance would be worthwhile. In the world of monorepos CI times are a shared suffering, thus selecting a fast tools is really important.
And Ruff is insanely fast.
For science, I have added Ruff to Tensorflow repo. And there it takes ~300ms to run lint for the whole repo.
On the monorepo, I was working with(~1.1k Python files) I was getting ~100ms.
1. Requirements #
Python is already bad enough, when it comes to the local environment setup, so I really did not want to introduce yet another tool, that developers would have to install on their machines.
So in my head the main requirements were:
- It has to be fast
- The installation and then application should be managed by Bazel
So with the requirements formulated and the courage gathered came the first and the most obvious question. How do we actually run it?
2. The Setup #
Unfortunately, Python Rules are quite lacking when it comes to linting and formatting, so after looking into possible options I have settled on downloading Ruff binaries using http_archive rules.
Let’s build a small sample project to illustrate the approach!
2.1 Downloading the binaries #
First step is to, well, download the binaries by adding this to the WORKSPACE
file:
#WORKSPACE
ruff_version = "0.3.5"
# https://github.com/bazelbuild/bazel/issues/20269
http_file(
name = "ruff-osx",
sha256 = "75522512ed44a554968483e205f3c7260b7e05c90462a9edf69c8f0d737ddf1d",
urls = ["https://github.com/astral-sh/ruff/releases/download/v{0}/ruff-{0}-aarch64-apple-darwin.tar.gz".format(ruff_version)],
)
http_archive(
name = "ruff-linux",
build_file_content = 'exports_files(["ruff"])',
sha256 = "4326f4121b7fb2f4adbffcc6d07a595f5869a95b70793b70c16951715dc601de",
urls = ["https://github.com/astral-sh/ruff/releases/download/v{0}/ruff-{0}-x86_64-unknown-linux-gnu.tar.gz".format(ruff_version)],
)
One quirk that you might notice is that we are using http_file
for macOS and http_archive
for Linux. The reason for that
is there is a bug for tar.gz
extraction on macOS for Ruff, but it is something that can be worked around.
2.2 Using downloaded binaries #
Great, now that we have something to run! In order to run the binaries, let’s create a dedicated tools
directory with
a BUILD.bazel
inside. There we will create an alias for the binary we have downloaded.
First, however, we need to implement a workaround for macOS:
#tools/BUILD.bazel
genrule(
name = "ruff-gen-osx",
srcs = ["@ruff-osx//file"],
outs = ["ruff-osx"],
cmd = "tar -xvf $< && mv ruff $@",
executable = True,
target_compatible_with = [
"@platforms//os:osx",
"@platforms//cpu:arm64",
],
)
Here we use genrule to extract the binary from the archive we have
downloaded via http_file
. We are also specifying target_compatible_with
for this rule, since we don’t need to run it
for anything other than macOS.
Now we can create an alias for the binaries!
#tools/BUILD.bazel
alias(
name = "ruff",
actual = select({
"@bazel_tools//src/conditions:linux_x86_64": "@ruff-linux//:ruff",
"@bazel_tools//src/conditions:darwin_arm64": ":ruff-gen-osx",
}),
)
Here we bind the extracted binaries to the rule named ruff
, this will make it easy for us to run it.
3. Running and Final Results #
With the setup that we have we can run ruff
like this:
To format:
bazel run //tools:ruff -- format
To lint and fix:
bazel run //tools:ruff -- check --fix
And here you can find the source code :)
Appendix #
Things that did not work out #
Wrapper macro / running with py_test #
At first, I thought it would be possible to create a wrapper macro for the py_library, so that I could have additional targets created for the libraries that we have(something like this).
However, it’s not something that could have worked with Ruff, since Ruff is just a binary that so launching it from Python would have required looking for it which seemed a little bit too brittle.
Using third-party rules #
As an option I have also considered running rules_lint from Aspect, these are quite powerful.
To have the best UX when using these rules, you need to override your Bazel CLI with one provided by Aspect
and this is something that I did not want to bring into my project. In theory, you can also
have a simple lint.sh
wrapper to run these rules, but again, it’s just not something that felt natural.
Running on CI #
In my case I have created a simple .sh
script to execute checks on CI