Auto-generation

metatools includes the doit command, which implements “auto-generation” of ebuilds. But what exactly is “auto-generation”?

In its broadest sense, “auto-generation” is the high-level creation of ebuilds. This can involve any number of advanced capabilities, such as querying GitHub or GitLab to find the latest version of a package, actually fetching source code and looking inside it, or using Jinja to generate ebuilds using templates. Typically, multiple approaches are used together.

We use these capabilities to reduce the manual labor required to maintain packages. These capabilities exist to give us leverage over the complex world of software so that we can automate as much as possible, so we can do more with less.

Running Auto-Generation

To actually use these tools to auto-generate ebuilds, it is recommended that you check out the kit-fixups repository, which is the master repository of Funtoo Linux. This repository is organized into directories for each kit, and sub-directories for each kit version, with current often used as a sub-directory name for kit version. So, for example, if we wanted to auto-generate all ebuilds in core-kit/current, we would do this:

$ cd development
$ git clone https://code.funtoo.org/bitbucket/scm/core/kit-fixups.git
$ cd kit-fixups/core-kit/curated
$ doit

In the above example, it will see that the directory kit-fixups/core-kit/curated is an overlay that contains categories, and it will look inside it for all “autogens” and execute them.

When doit runs, it will determine its context by looking at the current working directory, similar to how the git command will find what git repository it is in by looking backwards from the current working directory. It will then fire off auto-generation in the current directory, looking in the current directory and any sub-directories for all autogens, and will execute them. You will see a lot of .... being printed on the screen, which means that files are being downloaded. What is actually happening is that doit is querying Web APIs like GitHub and GitLab to find the latest versions of packages, and then downloading the source code tarballs (in metatools vernacular: “artifacts”) for these packages, and a period is printed for each block of data received to show progress. Often times, multiple artifacts are being downloaded at the same time. Then, as the artifacts are received, doit creates ebuilds for these packages, and also creates Manifest files referencing the SHA512 and other digests of the downloaded Artifacts. You end up with ebuilds that you can test out by running ebuild foo-1.0.ebuild clean merge.

Where are these “autogens”? They are autogen.py files that exist in the repository. Think of our autogens as plug-ins, written in Python and leveraging the POP Framework, that contain a generate() function which can generate one or more ebuilds using the metatools API. The metatools API, which we’ll look at in a bit, is an extensible API that lets us query Web APIs, use Jinja and perform other neat tricks to generate ebuilds.

In addition to raw autogens, there are also autogen.yaml files which allow for creation of ebuilds en masse. In the YAML, you specify an autogen (also called a metatools “generator”) plus packages and package-specific metadata to feed to that generator. When you feed package data to a generator, it spits out ebuilds. This is both highly efficient (it’s fast) and also a nice way to generate ebuilds with little or no redundant code. metatools contains a number of built-in generators that can be used with the YAML system, such as generators that build ebuilds for Python packages on PyPi.

Go ahead and poke around inside kit-fixups and look at the autogen.py and autogen.yaml files. You’ll begin to get a sense for what they look like and and inkling of how everything works.

Also type git status. You should see that a bunch of ebuilds (along with Manifest files) were created. These files are not added to git. They simply sit in your local repo, and you can blow them away by running:

$ git clean -fd

When doing developent, we actually do not want to commit the auto-generated ebuilds themselves to kit-fixups – we just want to commit the autogens (autogen.py and autogen.yaml.) There is a separate step, peformed by the merge-kits command, which updates the meta-repo and will commit the generated ebuilds to kits which are then pushed out to users. But for kit-fixups, we’re doing development, not updating the tree, so we just want to commit the autogens.

Developing Auto-Generation Scripts

Now that we’ve covered how to execute auto-generation scripts, let’s take a look at creating them.

Basic Stand-Alone Layout

The simplest form of auto-generation is called stand-alone auto-generation. Stand-alone auto-generation scripts have the name autogen.py and can be located inside a catpkg directory – at the same level that you would place ebuilds. Typically, you would also create a templates/ directory next to autogen.py, containing template files that you use to create your final ebuilds. For example, if we were doing an autogen for a package called sys-apps/foobar, which is a “core” system package, we would:

  1. Create an autogen.py file at kit-fixups/curated/sys-apps/foobar/autogen.py
  2. Create a kit-fixups/curated/sys-apps/foobar/templates/foobar.tmpl file (a template for the ebuild.)

The Generator

The autogen.py script is, as you might guess, a python file. And it is actually treated as a plugin (see POP Framework) which gives it a special structure. The auto-generation function that gets called to do all the things is called generate() and should be defined as:

async def generate(hub, **pkginfo):

Here is a full example of an autogen.py that implements auto-generation of the sys-apps/hwids package:

#!/usr/bin/env python3

async def generate(hub, **pkginfo):
  github_user = "gentoo"
  github_repo = "hwids"
  json_list = await hub.pkgtools.fetch.get_page(
      f"https://api.github.com/repos/{github_user}/{github_repo}/tags", is_json=True
  )
  latest = json_list[0]
  version = latest["name"].split("-")[1]
  url = latest["tarball_url"]
  final_name = f'{pkginfo["name"]}-{version}.tar.gz'
  ebuild = hub.pkgtools.ebuild.BreezyBuild(
      **pkginfo,
      github_user=github_user,
      github_repo=github_repo,
      version=version,
      artifacts=[hub.pkgtools.ebuild.Artifact(url=url, final_name=final_name)],
  )
  ebuild.push()

The doit command, when run in the same directory in the autogen.py or in a parent directory that is still in the repo, will find this autogen.py file, map it as a plugin, and execute its generate() method. This particular auto-generation plugin will perform the following actions:

  1. Query GitHub’s API to determine the latest tag in the gentoo/hwids repository.
  2. Download an archive (called an Artifact) of this source code if it has not been already downloaded.
  3. Use templates/hwids.tmpl to generate a final ebuild with the correct version.
  4. Generate a Manifest referencing the downloaded archive.

After autogen.py executes, you will have a new Manifest file, as well as a hwids-x.y.ebuild file in the places you would expect them. These files are not added to the git repository – and typically, when you are doing local development and testing, you don’t want to commit these files. But you can use them to verify that the autogen ran successfully.

The Base Objects

Above, you’ll notice the use of several objects. Let’s look at what they do:

hub.pkgtools.ebuild.Artifact
This object is used to represent source code archives, also called “artifacts”. Its constructor accepts two keyword arguments. The first is url, which should be the URL that can be used to download the artifact. The second is final_name, which is used to specify an on-disk name if the url does not contain this information. If final_name is omitted, the last part of url will be used as the on-disk name for the artifact.
hub.pkgtools.ebuild.BreezyBuild
This object is used to represent an ebuild that should be auto-generated. When you create it, you should pass a list of artifacts in the artifacts keyword argument for any source code that it needs to download and use.

These objects are used to create a declarative model of ebuilds and their artifacts, but simply creating these objects doesn’t actually result in any action. You will notice that the source code above, there is a call to ebuild.push() – this is the command that adds our BreezyBuild (as well as the artifact we passed to it) to the auto-generation queue. doit will “instantiate” all objects on its auto-generation queue, which will actually result in action.

What will end up happening is that the BreezyBuild will ensure that all of its source code artifacts have been downloaded (“fetched”) and then it will use this to create a Manifest as well as the ebuild itself.

pkginfo Basics

You will notice that our main generate function contains an argument called **pkginfo. You will also notice that we pass **pkginfo to our BreezyBuild, as well as other additional information. What is this “pkginfo”? It is a python dictionary containing information about the catpkg we are generating. We take great advantage of “pkginfo” when we use advanced YAML-based ebuild auto-generation, but it is still something useful when doing stand-alone auto-generation. The doit command will auto-populate pkginfo with the following key/value pairs:

name
The package name, i.e. hwids.
cat
The package category, i.e. sys-apps.
template_path
The path to where the templates are located for this autogen, i.e. the templates directory next to the autogen.py

While this “pkginfo” construct doesn’t seem to be the most useful thing right now, it will soon once you start to take advantage of advanced autogen features. For now, it at least helps us to avoid having to explicitly passing name, cat and template_path to our BreezyBuild – these are arguments that our BreezyBuild expects and we can simply “pass along” what was auto-detected for us rather than specifying them manually.

Querying APIs

It is not required that you query APIs to determine the latest version of a package to build, but this is often what is done in an autogen.py file. To this end, the official method to grab data from a remote API is hub.pkgtools.fetch.get_page(). Since this is an async function, it must be await``ed. If what you are retrieving is JSON, then you should pass ``is_json=True as a keyword argument, and you will get decoded JSON as a return value. Otherwise, you will get a string and will be able to perform additional processing. For HTML data, typically people will use the re (regular expression) module to extract data, and lxml or xmltodict can be used for parsing XML data.

There is also a refresh_interval keyword argument which can be used to limit updates to the remote resource to a certain time interval. For example, this is used with the brave-bin autogen to ensure that we only get updates every 5 days (they update the Brave browser daily and this update interval is a bit too much for us):

json_dict = await hub.pkgtools.fetch.get_page(
  "https://api.github.com/repos/brave/brave-browser/releases", is_json=True, refresh_interval=timedelta(days=5)
)

HTTP Tricks

Sometimes, it is necessary to grab the destination of a HTTP redirect, because the version of an artifact will be in the redirected-to URL itself. For example, let’s assume that when you go to https://foo.bar.com/latest.tar.gz, you are instantly redirected to https://foo.bar.com/myfile-3002.tar.gz. To grab the redirected-to URL, you can use the following method:

next_url = await hub.pkgtools.fetch.get_url_from_redirect("https://foo.bar.com/latest.tar.gz")

next_url will now contain the string https://foo.bar.com/myfile-3002.tar.gz, and you can pull it apart using standard Python string operators and methods to get the version from it.

Note that both the Zoom-bin autogen and Discord-bin autogen use this technique.

Using Jinja in Templates

Up until now, we have not really talked about Templates. Templates contain the actual literal content of your ebuild, but can include Jinja processing statements such as variables and even conditionals and loops. Everything passed to your ``BreezyBuild`` can be expanded as a Jinja variable. For example, you can use the following variables inside your template:

{{cat}}
Will expand to package category.
{{name}}
Will expand to package name (without version).
{{version}}
Will expand to package version.
{{artifacts[0].src_uri}}
Will expand to the string to be included in the SRC_URI for the first (and possibly only) Artifact.
SRC_URI="{{artifacts|map(attribute='src_uri')|join(' ')}}"
Will expand to be your full SRC_URI definition assuming you don’t have any conditional ones based on USE variables.

It’s important to note that in some cases, you will not even need to use a single Jinja-ism in your template, and can simply have the entire literal ebuild as the contents of your template. The Discord-bin autogen template is like this and simply contains the contents of the ebuild, because the only thing that changes between new Jinja versions is the filename of the ebuild, but not anything in the ebuild itself. So we don’t need to expand any variables.

But when we get into more advanced examples, particularly YAML-based auto-generation, Jinja tends to be used more heavily.

Here are some other Jinja constructs you may find useful:

{%- if myvar is defined %}
myvar is defined.
{%- else %}
myvar is not defined.
{%- endif %}

{%- if foo == "bar" %}
This text will only be included if the variable "foo" equals the string "bar".
{%- elif foo == "oni" %}
Hmmm... foo is oni?
{%- endif %}

{%- for file in mylist %}
{{file}}
{%- endfor %}

You can see that Jinja gives you a lot of power to generate the final representation of the ebuild that you want. Remember that you can always pass new keyword arguments to the constructor for BreezyBuild and then access them in your templates. For more information on what Jinja can do, browse the official Jinja Documentation or look in the kit-fixups repo for interesting examples.

Using Multiple Templates or BreezyBuilds

As mentioned earlier, you can place templates in the templates/ directory next to your autogen, and by default, the BreezyBuild will use the template with the same name as your package. To change this, you can pass the template="anothertemplate.tmpl" keyword argument to your BreezyBuild or pass a different name to your BreezyBuild (name is normally part of the **pkginfo dict.) You might want to do this if you are using your autogen.py to generate more than one ebuild – which is perfectly legal and supported. In this case, you will want to vary the name and/or cat arguments that get passed to BreezyBuild (these typically come via **pkginfo) to specify a new package name and/or category. Remember to call .push() for every ebuild you want to generate. See the Virtualbox-bin Autogen for an example.

Introspecting Inside Artifacts

You may be wondering if it is possible to grab a source tarball, look inside it, and parse things like Makefile or meson.build files to base your build steps on stuff inside the Artifact. Yes, this is definitely possible. To do it, you will first want to define an Artifact all by itself, and then call its ensure_fetched() or fetch() async method. You can then unpack it and inspect its contents:

import os
import glob

async def generate(hub, **pkginfo):
  my_artifact = Artifact(url="https://foo.bar.com/myfile-1.0.tar.gz")
  await my_artifact.ensure_fetched()
  my_artifact.extract()
  for meson_file in glob.iglob(os.path.join(my_artifact.extract_path, "*/meson.build"):
    ...
  my_artifact.cleanup()

See our xorg-proto Autogen for an example of this. It downloads xorg-proto and introspects inside it to generate a bunch of stub ebuilds for each protocol supported by xorg-proto.