A Python script for checking links and resources used in local static webpages (.htm, .html). With optional dependencies, it can also work with OpenDocument files (.odt, .odp, .ods), single OpenDocument XML files (.fodt, .fodp, .fods), and user-defined XML files.
linkmedic starts a test web server, requests an entry page from the server, and crawls all local pages. It checks all links within specific HTML tags (by default: <a>, <img>, <script>, <link>, <iframe>, and <event-listener>) and reports any "dead" links found. If a link appears on multiple pages, it is tested only once. By default, links to external websites are ignored. If there is a .linkignore file in the website's root, links matching the regular expressions listed in this file (one pattern per line; see below for examples) are also ignored during testing. After checking all the links, if any dead links are discovered, linkmedic exits with a non-zero status code.
For testing links in dynamic HTML content (e.g., using JavaScript template engines) or other document formats, you must first convert your files (using a third-party tool) to static HTML and then run linkmedic.
Depending on your operating system, you may have multiple options for installing the prerequisites. For a typical installation you will need:
You can install the linkmedic using your favorite Python package installer. For example, using pipx, you can download it from PyPI:
pipx install linkmedic
To start a test web server with files at /var/www and crawl the pages and test all the links starting from the /var/www/index.html page, run:
linkmedic --root=/var/www
As linkmedic is a Python script, it requires a working Python interpreter to be executed. Open-source implementations like CPython and PyPy support multiple operating systems and hardware architectures. Below are the available options for using this package, along with their requirements and details, sorted by size.
podman pull quay.io/meisam/linkmedic:latest
podman pull gitlab-registry.mpcdf.mpg.de/tbz/linkmedic:latest
Mount website files when using containers:
podman run --volume /www/public:/test quay.io/meisam/linkmedic:latest linkmedic --root=/test
The --volume flag maps /www/public to /test inside the container.
pipx install git+https://gitlab.mpcdf.mpg.de/tbz/linkmedic.git
pipx install linkmedic
pipx install linkmedic --index-url https://gitlab.mpcdf.mpg.de/api/v4/projects/5763/packages/pypi/simple
To test OpenDocument files (.odt, .odp, .ods) or single XML files (.fodt, .fodp, .fods), install with:
pipx install linkmedic[odf]
You can also use the container image in your CI/CD pipelines. For example, for GitLab CI, in the .gitlab-ci.yml file:
test_internal_links: image: quay.io/meisam/linkmedic:latest script: - linkmedic --root=/var/www/ --entry=index.html --warn-http --with-badge after_script: - gitlab_badge_sticker.sh
or for Woodpecker CI in the .woodpecker.yml file:
test_internal_links: image: quay.io/meisam/linkmedic:latest commands: - linkmedic --root=/var/www/ --entry=index.html --warn-http
If you want to check the external links of your website in your CI pipeline, you must avoid running multiple tests in a short period of time, e.g., on each commit to the development branches. Otherwise, the IP address of your CI runners may get banned by external web servers. For example, in GitLab CI, you can limit the external link checks to only the default branch of your Git repository:
test_external_links: image: quay.io/meisam/linkmedic:latest rules: - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH script: - linkmedic --root=/var/www/ --ignore-local --with-badge after_script: - gitlab_badge_sticker.sh allow_failure: true
Please note that the gitlab_badge_sticker.sh script used in these examples requires an API access token CI_API_TOKEN with maintainer permission to modify the GitLab repository badges. See the linkmedkit documentation for more details.
linkmedic -h
linkmedic
linkmedic --root=./tests/public1/
linkmedic --root=./tests/public1/ --entry=index2.html
linkmedic --no-local-redirect
Check links to external websites.
⚠️ IMPORTANT: You must avoid running the link checker on external links multiple times in a short period, e.g., on each commit to the development branch. Otherwise, the IP address of your machine (or CI runners) may get banned by the CDN or the DoS mitigation solution of the external web servers. See the CI/CD section for a possible solution.
linkmedic --check-external
linkmedic --no-external-redirects
linkmedic --ignore-local
linkmedic --ignore-status 403 503
linkmedic --entry=./presentation.odp
linkmedic --warn-http
linkmedic --domain=mydomain.com
linkmedic --port=3000
linkmedic --with-badge
linkmedic --exit-zero
linkmedic --guidelines-override-file=linkmedic.guides.ini
linkmedic --guidelines-dump-file=log.linkmedic.ini
linkmedic --dump-links
You can override the internal guidelines (configuration) of linkmedic by adding a .linkmedic.ini file with your desired values. This file is parsed using Python's internal configparser module. You can choose a different name for this file using the --guidelines-override-file flag. The default values will be used for any options that are missing in the override file.
The guidelines are logged to the output while running in verbose mode and can be saved to a file using the --guidelines-dump-file flag.
Each line in the .linkignore file specifies a regex pattern for addresses that should be ignored during link checks. Note that regex matches . to any character (use \. for matching only to .) and the leading / is considered when matching local links.
/ignore/.*/this /invalidfile\.tar\.gz /will_add/later\.html https://not\.accessible\.com
Please report bugs and code-related issues here. If you have an MPCDF account, use the upstream repository instead.
This repository is frequently used as a template for configuring Python development environments and CI/CD pipelines. It is intentionally designed with strict boundaries while prioritizing scalability and maintainability. Third-party dependencies are minimized to support this goal.
Code coverage is intentionally not 100%. While a few testing approaches are demonstrated, the focus is on showcasing practical methods rather than exhaustive coverage.
The design goal is to who the possibility of having the entire development toolchain in a PDM-managed virtual environment, and also a CI container, showing multiple methods developer can run CI pipelines locally. PDM tracks exact Python dependency versions, which are detailed in its PEP 751 lock file pylock.toml.
Versioning is dynamic, based on Git tags. Project documentation is versioned, and its HTML output is automatically built and deployed here.
CI and release container recipes are versioned, with OS packages sourced from the latest minor version of their base OS image at build time. The development toolchain includes:
Refer to the developer's guide for code development details. See the maintainer's guide for maintenance and release checklists.
The original idea for this project came from Dr. Klaus Reuter (MPCDF). Fruitful discussions with Dr. Sebastian Kehl (MPCDF) facilitated the packaging and release of this project.
Accompanying tools for linkmedic have been moved to a separate repository (linkmedkit) starting with version 0.7.
All rights reserved.
This software may be modified and distributed under the terms of the 3-Clause BSD License. See the LICENSE file for details.