← Software

htmlq

jq for HTML — a command-line HTML element selector.

htmlq is jq for HTML — a command-line HTML element selector. It parses HTML and lets you select elements with CSS selectors, extract attributes, text content, or pretty-print the document tree. The query language is the familiar CSS selector syntax, with a few command-line conveniences for getting at attribute values directly.

Michael Maclean wrote htmlq in Rust. It pairs especially well with curl in shell pipelines: curl a URL, pipe to htmlq, get back the elements that match. For one-off web-scraping tasks this is dramatically less work than firing up a Python script with BeautifulSoup.

htmlq has become a standard part of the "modern Unix toolbox" of small Rust replacements for tasks that previously required short scripts. It is packaged in most major distributions and installable with cargo install htmlq if not.

License: MIT

Category: CLI tools

Website: https://github.com/mgdm/htmlq

Install

cargo install htmlq
Or: brew install htmlq (macOS)

Authors

  • mgdm (creator)
PreviousHomebrew Nexthtmx

This site is currently in Beta. Contact: Chris Paton

Textbook of AI · Textbook of Usability · Textbook of Digital Health

Auckland Maths and Science Tutoring