Skip to main content

Setting Up Ruff Linter And Formatter for Python in Bazel

·4 mins
bazel python ruff
Phil Uvarov
Author
Phil Uvarov
Doing Data and Backend things @ Embark Studios
Table of Contents

Recently, I have been working on a monorepo where we use Bazel as a build system. It’s a polyglot repository with several languages. Python is the one that is being used mostly for the backoffice and data work(internal tools, dashboards, data ETLs, etc.).

Developers using Python were far less numerous than those using Go(main production language), which led to a lack of standardization in the codebase. As the data team started to grow, I have decided that the time has come to introduce a proper automated code formatter and linter, as well as CI checks for it.

In this post, I will share how I have set up Ruff for Python in Bazel.

0. Why Ruff?
#

Although my team was already using Black as a formatter, I thought giving Ruff a chance would be worthwhile. In the world of monorepos CI times are a shared suffering, thus selecting a fast tools is really important.

And Ruff is insanely fast.

For science, I have added Ruff to Tensorflow repo. And there it takes ~300ms to run lint for the whole repo.

On the monorepo, I was working with(~1.1k Python files) I was getting ~100ms.

1. Requirements
#

Python is already bad enough, when it comes to the local environment setup, so I really did not want to introduce yet another tool, that developers would have to install on their machines.

So in my head the main requirements were:

  1. It has to be fast
  2. The installation and then application should be managed by Bazel

So with the requirements formulated and the courage gathered came the first and the most obvious question. How do we actually run it?

2. The Setup
#

Unfortunately, Python Rules are quite lacking when it comes to linting and formatting, so after looking into possible options I have settled on downloading Ruff binaries using http_archive rules.

Let’s build a small sample project to illustrate the approach!

2.1 Downloading the binaries
#

First step is to, well, download the binaries by adding this to the WORKSPACE file:

#WORKSPACE

ruff_version = "0.3.5"

# https://github.com/bazelbuild/bazel/issues/20269
http_file(
    name = "ruff-osx",
    sha256 = "75522512ed44a554968483e205f3c7260b7e05c90462a9edf69c8f0d737ddf1d",
    urls = ["https://github.com/astral-sh/ruff/releases/download/v{0}/ruff-{0}-aarch64-apple-darwin.tar.gz".format(ruff_version)],
)

http_archive(
    name = "ruff-linux",
    build_file_content = 'exports_files(["ruff"])',
    sha256 = "4326f4121b7fb2f4adbffcc6d07a595f5869a95b70793b70c16951715dc601de",
    urls = ["https://github.com/astral-sh/ruff/releases/download/v{0}/ruff-{0}-x86_64-unknown-linux-gnu.tar.gz".format(ruff_version)],
)

One quirk that you might notice is that we are using http_file for macOS and http_archive for Linux. The reason for that is there is a bug for tar.gz extraction on macOS for Ruff, but it is something that can be worked around.

2.2 Using downloaded binaries
#

Great, now that we have something to run! In order to run the binaries, let’s create a dedicated tools directory with a BUILD.bazel inside. There we will create an alias for the binary we have downloaded.

First, however, we need to implement a workaround for macOS:

#tools/BUILD.bazel

genrule(
    name = "ruff-gen-osx",
    srcs = ["@ruff-osx//file"],
    outs = ["ruff-osx"],
    cmd = "tar -xvf $< && mv ruff $@",
    executable = True,
    target_compatible_with = [
        "@platforms//os:osx",
        "@platforms//cpu:arm64",
    ],
)

Here we use genrule to extract the binary from the archive we have downloaded via http_file. We are also specifying target_compatible_with for this rule, since we don’t need to run it for anything other than macOS.

Now we can create an alias for the binaries!

#tools/BUILD.bazel

alias(
    name = "ruff",
    actual = select({
        "@bazel_tools//src/conditions:linux_x86_64": "@ruff-linux//:ruff",
        "@bazel_tools//src/conditions:darwin_arm64": ":ruff-gen-osx",
    }),
)

Here we bind the extracted binaries to the rule named ruff, this will make it easy for us to run it.

3. Running and Final Results
#

With the setup that we have we can run ruff like this:

To format:

bazel run //tools:ruff -- format

To lint and fix:

bazel run //tools:ruff -- check --fix

And here you can find the source code :)

Appendix
#

Things that did not work out
#

Wrapper macro / running with py_test
#

At first, I thought it would be possible to create a wrapper macro for the py_library, so that I could have additional targets created for the libraries that we have(something like this).

However, it’s not something that could have worked with Ruff, since Ruff is just a binary that so launching it from Python would have required looking for it which seemed a little bit too brittle.

Using third-party rules
#

As an option I have also considered running rules_lint from Aspect, these are quite powerful. To have the best UX when using these rules, you need to override your Bazel CLI with one provided by Aspect and this is something that I did not want to bring into my project. In theory, you can also have a simple lint.sh wrapper to run these rules, but again, it’s just not something that felt natural.

Running on CI
#

In my case I have created a simple .sh script to execute checks on CI