Quantcast
Channel: Fedora Magazine
Viewing all articles
Browse latest Browse all 114

Making sense of software licensing with FSFE REUSE: A beginner’s guide for open source developers

$
0
0

Among the many details developers juggle, software licensing is often treated as an afterthought. We know we need it. However, faced with choosing the right license, tracking inherited code, and keeping things consistent, license management can feel like a bureaucratic burden.

Licensing is what makes the REUSE project, maintained by the Free Software Foundation Europe (FSFE), such an interesting and important effort. It does not try to replace the legal work involved in choosing a license or deciphering obligations. Instead, REUSE focuses on the mechanics of software licensing. It addresses how we communicate licensing clearly, unambiguously, and reliably in the code itself. REUSE has been adopted by a lot of projects already. These include SAP, Nextcloud and numerous Ansible community roles and collections.

I recently went down the licensing and supply chain rabbit hole myself. I had to figure out how to apply it to open source projects I work on and explain it to others. Thus, I had the unique experience of learning it from scratch while also teaching it. That process gave me insight into what makes REUSE helpful. I learned where the roadblocks are, and how you can start using it in your own open source work. So this article aims to give additional reasoning and insights for every day usage beyond the scope of a quick start tutorial.

Why licensing still feels broken

If you’ve ever tried to make sense of licensing in a codebase with contributions from half a dozen sources, or tried to package software only to find ambiguous or conflicting license declarations, you’ve seen the brokenness first hand. It’s a common pain point.

You start coding. A LICENSE file goes in the root. Maybe it’s MIT, maybe Apache 2.0, maybe GPLv3-or-later. We figure that’s enough. For the most part, tools like Licensee (which Github uses) will scan that file and report the project as single-licensed under whatever it finds.

But that is only part of the picture.

Real-world projects grow messy over time. Files come in from various places. Pull requests, upstream forks, old backups. Someone pastes in a script from Stack Overflow. Someone else uploads a code generator output. Over time, the repository becomes a tangle of files with unclear origins. The top-level LICENSE file can’t speak for all of it any more. But the tools like licensee don’t know that, and often neither do the maintainers.

If you provide code without clear licensing information, you make it hard for the open source ecosystem to collaborate with or consume your work. Unfortunately, approaches to automatic license detection can’t deliver the needed certainty. They rely on fuzzy matching, heuristics, and assumptions (like there is “one license for the project”). This just does not cut it when legal clarity is required. Automatic license heuristics are complicated and will never deliver reliable results for all the possible use cases.

FSFE REUSE to the rescue

Rather than trying to detect or infer licensing, REUSE asks developers to be explicit in a machine-readable, auditable way:

  1. There has to be a text copy of every used license below a LICENSES/ directory1 in the root of the project.
  2. Each file in your project must have machine-readable copyright and licensing information associated with it.

Again: This matters. It means anyone—an auditor, a packager, a contributor, or a compliance team—can look at any file in your repository and immediately understand its legal status. There is no guessing, no cross-referencing, no “well maybe this falls under the MIT license because the rest of the project does.” It’s explicit. It’s standardized. And it is quickly lintable which is great for teams with Continuous Integration.

Following REUSE, adding machine-readable copyright and licensing information can be done in the following ways:

  • Comment headers or <filename>.license for uncommentable files.
  • REUSE.toml, a machine-readable copyright file to address file and directory names. This is especially handy to define:
    • a default license for your project.
    • deviant licenses for third party artefacts residing in a sub-directory.

REUSE backs up its specification with a simple, focused, reuse command-line tool. This makes adoption relatively painless (even though all can also be done manually). REUSE fits nicely into the ecosystem, especially by relying on Software Package Data Exchange (SPDX) and SPDX license identifiers.

Usage

The official REUSE tutorial and the tool usage section is really good so I will just reproduce a quick start here:

  1. Put your licenses in the LICENSES/ directory.
  2. Add a comment header to each file that says something like

    SPDX-License-Identifier: GPL-3.0-or-later
    SPDX-FileCopyrightText: $YEAR $NAME

    You can be flexible with the format, just make sure that the line starts with “SPDX-License-Identifier:“ and/or “SPDX-FileCopyrightText:

Comment headers

REUSE, and many organizations like GNU, recommend including license header comments in source files as it helps to prevent confusion or errors. So even if the REUSE.toml copyright file exists as the central place for licensing information, sometimes files get copied or forked into new projects and third parties might not have a well organized repository bureaucracy. Without a statement about what their license is, moving single files into another context might eliminate all trace of that information.

Example of a header comment:

# SPDX-FileCopyrightText: Andreas Haerter, ACME Corp (https://example.com)
# SPDX-License-Identifier: CC-BY-SA-4.0

One with dual-licensing:

/*
SPDX-FileCopyrightText: Jane Doe <j.doe@example.com>
SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later
*/

REUSE.toml

You might come to the conclusion that to skip adding headers to every file and only using a REUSE.toml is better for your project … fair enough and that will still be compliant. It is also possible to bulk-license whole directories using this technique. The file format is specified, but a simple example helps to get started:

version = 1
SPDX-PackageName = "Foo bar project"
SPDX-PackageDownloadLocation = "https://git.example.com/foobar"
SPDX-PackageSupplier = "ACME Inc. (https://example.com)"

[[annotations]]
path = "**"
precedence = "closest"
SPDX-FileCopyrightText = "ACME Inc."
SPDX-License-Identifier = "LGPL-2.1-or-later"

Verification

Now verify your work using reuse lint:

$ reuse lint
[...]
Congratulations! Your project is compliant with version 3.4 of the REUSE Specification :-)

Demo

The FSFE created a small screencast2, which follows the tutorial, making the REUSE example repository compliant:

Tips and tricks

Needed vocabulary when you start learning or teaching

To follow documentation and communication in the licensing space, it is important to know about the meaning of:

  • Software Package Data Exchange (SPDX) and SPDX license identifiers: A standard for identifying licenses using short, consistent identifiers (like MIT or GPL-3.0-only). It simplifies license tracking and automation.
  • Software Bill of Materials (SBOM): A structured list of all software components and their licenses in a project. It helps with transparency, security audits, and legal compliance.
  • Copyleft (license): A type of open source license that ensures derivative works remain under the same license. It protects user freedoms by requiring shared modifications. The GPL is a well-known example.
  • Permissive (license): A license that allows code to be reused with minimal conditions, including in proprietary software, without giving back modifications. Common examples include MIT, BSD, and Apache 2.0.
  • TOML: A configuration file format. REUSE.toml (a machine-readable file in your project’s root directory) uses it to declare licensing information based on filename patterns.
  • DEP5: A machine-readable debian/copyright file which was used before REUSE.toml. DEP5, while still supported, has been deprecated since the introduction of REUSE.toml. This is important to know when hitting older documentation or tutorials.

My personal killer feature: Additional comments in REUSE.toml

It might sound trivial, but it was always cumbersome for me to keep track of the originally used download URLs and other common data around simple third-party files, like “this small icon there“. From my point of view, the RESUE.toml file is the ideal place to keep additional data on third party files by using SPDX-FileComment without cluttering the repository or the end-user documentation. If there is at least one example, in my experience, maintaining source information and reasoning for third-party files is quickly adopted even in teams without many regulations:

Example 1: REUSE.toml with sections tracking the original download URLs or notes

[...]
[[annotations]]
path = "assets/images/window.svg"
precedence = "closest"
SPDX-FileCopyrightText = "2022 Refactoring UI Inc."
SPDX-License-Identifier = "MIT"
SPDX-FileComment = "https://github.com/tailwindlabs/heroicons/blob/master/optimized/24/outline/window.svg"

[[annotations]]
path = ["extensions/Find-WindowHandle.ps1", "extensions/Helper.ps1"]
precedence = "closest"
SPDX-FileCopyrightText = "2018 Grégoire Geis (https://github.com/71/Focus-Window/)"
SPDX-License-Identifier = "MIT"
SPDX-FileComment = "Slightly adapted for this project by foundata GmbH (https://foundata.com)"
[...]

Example 2: The REUSE.toml from SAP/openui5 which uses the file patterns and comments to keep track of single files copied from other projects:

[...]
[[annotations]]
path = "src/sap.ui.integration/test/sap/ui/integration/demokit/cardExplorer/webapp/thirdparty/CfWorkerJsonSchemaValidator.js"
precedence = "aggregate"
SPDX-FileCopyrightText = "2020 Jeremy Danyow"
SPDX-License-Identifier = "MIT"
SPDX-FileComment = "these files belong to: @cfworker/json-schema"

# Library: sap.ui.webc.common:
[[annotations]]
path = [
"src/sap.ui.webc.common/src/sap/ui/webc/common/thirdparty/base/**",
"src/sap.ui.webc.common/src/sap/ui/webc/common/thirdparty/theming/**",
"src/sap.ui.webc.common/src/sap/ui/webc/common/thirdparty/localization/**",
"src/sap.ui.webc.common/src/sap/ui/webc/common/thirdparty/icons/**",
"src/sap.ui.webc.common/src/sap/ui/webc/common/thirdparty/icons-tnt/**",
"src/sap.ui.webc.common/src/sap/ui/webc/common/thirdparty/icons-business-suite/**"
]
precedence = "aggregate"
SPDX-FileCopyrightText = "SAP"
SPDX-License-Identifier = "Apache-2.0"
SPDX-FileComment = "these files belong to: UI5 Web Components"
[...]

README section template about licensing and copyright for humans

I find it useful to have a generic, easy to adapt text snippet for the README.md or a comparable central place which is easy for humans to notice and read. I created and use the following template, taking advantage of the existing REUSE information to make the section basically maintenance free without being useless:

## Licensing, copyright

<!--REUSE-IgnoreStart-->
Copyright (c) YYYY, ACME Inc.

This project is licensed under the GNU General Public License v3.0 or later (SPDX-License-Identifier: `GPL-3.0-or-later`), see [`LICENSES/GPL-3.0-or-later.txt`](LICENSES/GPL-3.0-or-later.txt) for the full text.

The [`REUSE.toml`](REUSE.toml) file provides detailed licensing and copyright information in a human- and machine-readable format. This includes parts that may be subject to different licensing or usage terms, such as third-party components. The repository conforms to the [REUSE specification](https://reuse.software/spec/). You can use [`reuse spdx`](https://reuse.readthedocs.io/en/latest/readme.html#cli) to create a [SPDX software bill of materials (SBOM)](https://en.wikipedia.org/wiki/Software_Package_Data_Exchange).
<!--REUSE-IgnoreEnd-->

Replace YYYY with the year of the first release or code contribution and adapt the mentioned license, filenames and links as needed. The HTML comments prevent REUSE linting errors when e.g. listing multiple licenses.

The wording is already pointing to the copyright file (REUSE.toml) and mentions that parts of the project might be subject to different licensing than the main one. If this is not good enough, feel free to adapt the wording of the main “licensed under” sentence to highlight the main licensing rules without the need to maintain every single bit outside of the copyright file. Examples (adapt as needed):

The project is dual-licensed under the

* GNU General Public License v3.0 or later (SPDX-License-Identifier: `GPL-3.0-or-later`), see [`LICENSES/GPL-3.0-or-later.txt`](./LICENSES/GPL-3.0-or-later.txt) for the full text.
* Apache License 2.0 (SPDX-License-Identifier: `Apache-2.0`), see [`LICENSES/Apache-2.0.txt`](./LICENSES/Apache-2.0.txt) for the full text.

[... usual template follows ...]

License detection on Github or Gitlab

If you follow REUSE, you will notice that Github and Gitlab are no longer able to detect licensing information for your repository.

Even if automatic licensing is broken by design for the reasons outlined above, it is understood that it would be nice if all the broken license detection tools spit out something, even unreliable but working, for indexes and searches (for the sole reason of not having a disadvantage if inexperienced users are searching for projects and filter by often broken meta data).

If you need this, put the stated “license with the highest freedom protections” just for search-indexes and GitHub in a LICENSE or COPYING file in the root directory of your project:

  1. GitHub uses Licensee to attempt to identify the license of a project and Licensee does not support the REUSE specification.
  2. A workaround to fix the automatic license detection of GitHub and others is to place an additional LICENSE or COPYING file in the root directory of your project. This is allowable by REUSE. These files are explicitly ignored by the toolset and do not need an additional .license file or header.
  3. If you want to prevent a duplication of License texts, beware of another issue with Licensee: You can place a symlink at LICENSES/<your license>.txt pointing to the LICENSE or COPYING file in the project’s root directory. reuse lint will follow that link. Licensee sadly does not even support symlinks, so a more logical symlink from LICENSE or COPYING pointing to LICENSES/<your license>.txt is not solving the issue. I therefore recommend a real copy instead of a symlink to keep things accessible when using the workaround.

I, for myself, would use this workaround only if a single license is used for all of the project’s files. This would prevent misunderstandings or conflicts and simply ignore GitHub’s limited behavior in all other cases.

Years in copyright texts

This is not exactly a REUSE topic but I noticed it is discussed quite a lot when a project starts adopting REUSE. IANAL, but it is not necessary to update the copyright year since the main legal intention is to state the year of the first public release or code contribution. But it is common to do so anyway, especially since it shows third parties that a project is still alive.

I usually propose the following which might also be a useful technique for your project:

  • Update the copyright data but maintain the copyright year only at central places like a project’s README.md reduce the maintenance effort.
  • Simply add each year with a release or updates separated by commas. You can use a timespan (yearX-yearY) for multiple subsequent years.
  • Example:
    • The first release and copyright statement was Copyright (c) 2013.
    • There were releases or updates in several but not all years afterwards:
      • 2023 → Copyright (c) 2013, 2015, 2018-2021, 2023.
      • 2015 → Copyright (c) 2013, 2015.
      • 2018 → Copyright (c) 2013, 2015, 2018.
      • 2019 → Copyright (c) 2013, 2015, 2018, 2019.
      • 2020 → Copyright (c) 2013, 2015, 2018-2020.
      • 2021 → Copyright (c) 2013, 2015, 2018-2021.

Conclusion

Licensing clarity is needed for sustainable collaboration in open source. The REUSE specification doesn’t try to replace legal frameworks or licensing decisions, but it makes the messy practicalities of license management predictable, explicit, and automatable.

Adopting REUSE can feel like extra effort at first, especially for existing codebases. But once in place, it pays off by making your project easier to understand, maintain, package, and … reuse… 🙂 . REUSE helps you express the legal structure of your project in a way that machines and humans can agree on. And that’s worth a lot.

  1. Using the LICENSES/ directory is also recmmended by the CII best practices, and implemented by the Linux Kernel. ↩
  2. Source: https://download.fsfe.org/videos/reuse/screencasts/reuse-tool.gif, Copyright © 2001-2025 Free Software Foundation Europe, Verbatim copying and distribution is permitted in any medium, provided this notice is preserved. ↩


Viewing all articles
Browse latest Browse all 114

Trending Articles