Auto-generation¶
metatools
includes the doit
command, which implements “auto-generation”
of ebuilds. But what exactly is “auto-generation”?
In its broadest sense, “auto-generation” is the high-level creation of ebuilds. This can involve any number of advanced capabilities, such as querying GitHub or GitLab to find the latest version of a package, actually fetching source code and looking inside it, or using Jinja to generate ebuilds using templates. Typically, multiple approaches are used together.
We use these capabilities to reduce the manual labor required to maintain packages. These capabilities exist to give us leverage over the complex world of software so that we can automate as much as possible, so we can do more with less.
Running Auto-Generation¶
To actually use these tools to auto-generate ebuilds, it is recommended that you
check out the kit-fixups repository, which is the master repository of Funtoo
Linux. This repository is organized into directories for each kit, and sub-directories
for each kit version, with current
often used as a sub-directory name for kit
version. So, for example, if we wanted to auto-generate all ebuilds in
core-kit/current
, we would do this:
$ cd development
$ git clone https://code.funtoo.org/bitbucket/scm/core/kit-fixups.git
$ cd kit-fixups/core-kit/curated
$ doit
In the above example, it will see that the directory kit-fixups/core-kit/curated
is an overlay that contains categories, and it will look inside it for all “autogens”
and execute them.
When doit
runs, it will determine its context by looking at the current working
directory, similar to how the git
command will find what git repository it is in by looking
backwards from the current working directory. It will then fire off auto-generation in
the current directory, looking in the current directory and any sub-directories for all
autogens, and will execute them. You will see a lot of ....
being printed on the screen, which means
that files are being downloaded. What is actually happening is that doit
is querying
Web APIs like GitHub and GitLab to find the latest versions of packages, and then downloading
the source code tarballs (in metatools
vernacular: “artifacts”) for these packages, and a
period is printed for each block of data received to show progress. Often times, multiple
artifacts are being downloaded at the same time.
Then, as the artifacts are received, doit
creates ebuilds for these packages, and also creates Manifest
files referencing
the SHA512 and other digests of the downloaded Artifacts. You end up with ebuilds that you
can test out by running ebuild foo-1.0.ebuild clean merge
.
Where are these “autogens”? They are autogen.py
files that exist in the repository. Think of our autogens as plug-ins, written in Python and leveraging the
POP Framework, that contain
a generate()
function which can generate one or more ebuilds using the metatools
API. The metatools
API, which we’ll look at in a bit, is an extensible API that lets us query Web APIs, use
Jinja and perform other neat tricks to generate ebuilds.
In addition to raw autogens, there are also autogen.yaml
files which allow for creation of
ebuilds en masse. In the YAML, you specify an autogen (also called a metatools “generator”) plus packages and
package-specific metadata to feed to that generator. When you feed package data to a generator, it
spits out ebuilds. This is both highly efficient (it’s fast) and also a nice way to generate
ebuilds with little or no redundant code. metatools
contains a number of built-in generators
that can be used with the YAML system, such as generators that build ebuilds for Python packages on PyPi.
Go ahead and poke around inside kit-fixups and look at the autogen.py
and
autogen.yaml
files. You’ll begin to get a sense for what they look like and and inkling of how everything
works.
Also type git status
. You should see that a bunch of ebuilds (along with Manifest
files)
were created. These files are not added to git. They simply sit in your local repo, and you can
blow them away by running:
$ git clean -fd
When doing developent, we actually do not want to commit the auto-generated ebuilds themselves to
kit-fixups –
we just want to commit the autogens (autogen.py
and autogen.yaml
.) There is a separate
step, peformed by the merge-kits
command, which updates the meta-repo and will
commit the generated ebuilds to kits which are then pushed out to users. But for kit-fixups
,
we’re doing development, not updating the tree, so we just want to commit the autogens.
Developing Auto-Generation Scripts¶
Now that we’ve covered how to execute auto-generation scripts, let’s take a look at creating them.
Basic Stand-Alone Layout¶
The simplest form of auto-generation is called stand-alone auto-generation. Stand-alone auto-generation scripts
have the name autogen.py
and can be located inside a catpkg directory – at the same level that you would place
ebuilds. Typically, you would also create a templates/
directory next to autogen.py
, containing template files
that you use to create your final ebuilds. For example, if we were doing an autogen for a package called sys-apps/foobar
,
which is a “core” system package, we would:
- Create an
autogen.py
file atkit-fixups/curated/sys-apps/foobar/autogen.py
- Create a
kit-fixups/curated/sys-apps/foobar/templates/foobar.tmpl
file (a template for the ebuild.)
The Generator¶
The autogen.py
script is, as you might guess, a python file. And it is actually treated as a plugin (see
POP Framework) which gives it a special structure. The auto-generation function that gets called to do all
the things is called generate()
and should be defined as:
async def generate(hub, **pkginfo):
Here is a full example of an autogen.py
that implements auto-generation of the sys-apps/hwids
package:
#!/usr/bin/env python3
async def generate(hub, **pkginfo):
github_user = "gentoo"
github_repo = "hwids"
json_list = await hub.pkgtools.fetch.get_page(
f"https://api.github.com/repos/{github_user}/{github_repo}/tags", is_json=True
)
latest = json_list[0]
version = latest["name"].split("-")[1]
url = latest["tarball_url"]
final_name = f'{pkginfo["name"]}-{version}.tar.gz'
ebuild = hub.pkgtools.ebuild.BreezyBuild(
**pkginfo,
github_user=github_user,
github_repo=github_repo,
version=version,
artifacts=[hub.pkgtools.ebuild.Artifact(url=url, final_name=final_name)],
)
ebuild.push()
The doit
command, when run in the same directory in the autogen.py
or in a parent directory that
is still in the repo, will find this autogen.py
file, map it as a plugin, and execute its generate()
method. This particular auto-generation plugin will perform the following actions:
- Query GitHub’s API to determine the latest tag in the
gentoo/hwids
repository. - Download an archive (called an Artifact) of this source code if it has not been already downloaded.
- Use
templates/hwids.tmpl
to generate a final ebuild with the correct version. - Generate a
Manifest
referencing the downloaded archive.
After autogen.py
executes, you will have a new Manifest
file, as well as a hwids-x.y.ebuild
file in
the places you would expect them. These files are not added to the git repository – and typically, when you are
doing local development and testing, you don’t want to commit these files. But you can use them to verify that the
autogen ran successfully.
The Base Objects¶
Above, you’ll notice the use of several objects. Let’s look at what they do:
hub.pkgtools.ebuild.Artifact
- This object is used to represent source code archives, also called “artifacts”. Its constructor accepts two
keyword arguments. The first is
url
, which should be the URL that can be used to download the artifact. The second isfinal_name
, which is used to specify an on-disk name if theurl
does not contain this information. Iffinal_name
is omitted, the last part ofurl
will be used as the on-disk name for the artifact. hub.pkgtools.ebuild.BreezyBuild
- This object is used to represent an ebuild that should be auto-generated. When you create it, you should pass
a list of artifacts in the
artifacts
keyword argument for any source code that it needs to download and use.
These objects are used to create a declarative model of ebuilds and their artifacts, but simply creating these
objects doesn’t actually result in any action. You will notice that the source code above, there is a call
to ebuild.push()
– this is the command that adds our BreezyBuild
(as well as the artifact we passed to
it) to the auto-generation queue. doit
will “instantiate” all objects on its auto-generation queue, which
will actually result in action.
What will end up happening is that the BreezyBuild
will ensure that all of its source code artifacts have
been downloaded (“fetched”) and then it will use this to create a Manifest
as well as the ebuild itself.
pkginfo Basics¶
You will notice that our main generate
function contains an argument called **pkginfo
. You
will also notice that we pass **pkginfo
to our BreezyBuild
, as well as other additional information.
What is this “pkginfo”? It is a python dictionary containing information about the catpkg we are generating.
We take great advantage of “pkginfo” when we use advanced YAML-based ebuild auto-generation, but it is
still something useful when doing stand-alone auto-generation. The doit
command will auto-populate
pkginfo
with the following key/value pairs:
name
- The package name, i.e.
hwids
. cat
- The package category, i.e.
sys-apps
. template_path
- The path to where the templates are located for this autogen, i.e. the
templates
directory next to theautogen.py
While this “pkginfo” construct doesn’t seem to be the most useful thing right now, it will soon once you start to take
advantage of advanced autogen features. For now, it at least helps
us to avoid having to explicitly passing name
, cat
and template_path
to our BreezyBuild
–
these are arguments that our BreezyBuild
expects and we can simply “pass along” what was auto-detected
for us rather than specifying them manually.
Querying APIs¶
It is not required that you query APIs to determine the latest version of a package to build, but this is
often what is done in an autogen.py
file. To this end, the official method to grab data from a remote
API is hub.pkgtools.fetch.get_page()
. Since this is an async
function, it must be await``ed.
If what you are retrieving is JSON, then you should pass ``is_json=True
as a keyword argument, and you
will get decoded JSON as a return value. Otherwise, you will get a string and will be able to perform
additional processing. For HTML data, typically people will use the re
(regular expression) module
to extract data, and lxml
or xmltodict
can be used for parsing XML data.
There is also a refresh_interval
keyword argument which can be used to limit updates to the remote
resource to a certain time interval. For example, this is used with the brave-bin
autogen to ensure
that we only get updates every 5 days (they update the Brave browser daily and this update interval
is a bit too much for us):
json_dict = await hub.pkgtools.fetch.get_page(
"https://api.github.com/repos/brave/brave-browser/releases", is_json=True, refresh_interval=timedelta(days=5)
)
HTTP Tricks¶
Sometimes, it is necessary to grab the destination of a HTTP redirect, because the version of an
artifact will be in the redirected-to URL itself. For example, let’s assume that when you go to
https://foo.bar.com/latest.tar.gz
, you are instantly redirected to https://foo.bar.com/myfile-3002.tar.gz
.
To grab the redirected-to URL, you can use the following method:
next_url = await hub.pkgtools.fetch.get_url_from_redirect("https://foo.bar.com/latest.tar.gz")
next_url
will now contain the string https://foo.bar.com/myfile-3002.tar.gz
, and you can
pull it apart using standard Python string operators and methods to get the version from it.
Note that both the Zoom-bin autogen and Discord-bin autogen use this technique.
Using Jinja in Templates¶
Up until now, we have not really talked about Templates. Templates contain the actual literal content of your ebuild, but can include Jinja processing statements such as variables and even conditionals and loops. Everything passed to your ``BreezyBuild`` can be expanded as a Jinja variable. For example, you can use the following variables inside your template:
{{cat}}
- Will expand to package category.
{{name}}
- Will expand to package name (without version).
{{version}}
- Will expand to package version.
{{artifacts[0].src_uri}}
- Will expand to the string to be included in the
SRC_URI
for the first (and possibly only) Artifact. SRC_URI="{{artifacts|map(attribute='src_uri')|join(' ')}}"
- Will expand to be your full
SRC_URI
definition assuming you don’t have any conditional ones based onUSE
variables.
It’s important to note that in some cases, you will not even need to use a single Jinja-ism in your template, and can simply have the entire literal ebuild as the contents of your template. The Discord-bin autogen template is like this and simply contains the contents of the ebuild, because the only thing that changes between new Jinja versions is the filename of the ebuild, but not anything in the ebuild itself. So we don’t need to expand any variables.
But when we get into more advanced examples, particularly YAML-based auto-generation, Jinja tends to be used more heavily.
Here are some other Jinja constructs you may find useful:
{%- if myvar is defined %}
myvar is defined.
{%- else %}
myvar is not defined.
{%- endif %}
{%- if foo == "bar" %}
This text will only be included if the variable "foo" equals the string "bar".
{%- elif foo == "oni" %}
Hmmm... foo is oni?
{%- endif %}
{%- for file in mylist %}
{{file}}
{%- endfor %}
You can see that Jinja gives you a lot of power to generate the final representation of the ebuild that you
want. Remember that you can always pass new keyword arguments to the constructor for BreezyBuild
and
then access them in your templates. For more information on what Jinja can do, browse the
official Jinja Documentation or look in the kit-fixups repo for interesting examples.
Using Multiple Templates or BreezyBuilds¶
As mentioned earlier, you can place templates in the templates/ directory next to your autogen, and
by default, the BreezyBuild
will use the template with the same name as your package. To change this, you
can pass the template="anothertemplate.tmpl"
keyword argument to your BreezyBuild
or pass a different
name
to your BreezyBuild
(name
is normally part of the **pkginfo
dict.) You might want
to do this if you are using your autogen.py
to generate more than one ebuild – which is perfectly
legal and supported. In this case, you will want to vary the name
and/or cat
arguments that get
passed to BreezyBuild
(these typically come via **pkginfo
) to specify a new package name and/or
category. Remember to call .push()
for every ebuild you want to generate. See the Virtualbox-bin Autogen
for an example.
Introspecting Inside Artifacts¶
You may be wondering if it is possible to grab a source tarball, look inside it, and parse things like
Makefile
or meson.build
files to base your build steps on stuff inside the Artifact. Yes, this
is definitely possible. To do it, you will first want to define an Artifact
all by itself, and then
call its ensure_fetched()
or fetch()
async method. You can then unpack it and inspect its contents:
import os
import glob
async def generate(hub, **pkginfo):
my_artifact = Artifact(url="https://foo.bar.com/myfile-1.0.tar.gz")
await my_artifact.ensure_fetched()
my_artifact.extract()
for meson_file in glob.iglob(os.path.join(my_artifact.extract_path, "*/meson.build"):
...
my_artifact.cleanup()
See our xorg-proto Autogen for an example of this. It downloads xorg-proto
and introspects inside
it to generate a bunch of stub ebuilds for each protocol supported by xorg-proto
.