X LinkedIn Facebook Reddit Threads Email

Open SourceOpen SourceGo

Papeer

Name: Papeer
Author: Evan Mael

Papeer is a command-line web scraper built in Go that converts websites and RSS feeds into clean, readable ebooks and markdown files. Designed specifically for e-readers, it strips away ads and navigation to preserve only the essential content.

Evan Mael

27 March 2026 12 min 346 —

346 Stars GoOpen Source 12 min

Introduction

Overview

What is Papeer?

Papeer is a specialized web scraping tool designed for the e-ink era, built by developer lapwat and first released in September 2021. Written in Go, this command-line utility addresses a specific need: converting web content into formats optimized for e-readers like Kindle, reMarkable tablets, and other digital reading devices.

The tool's primary strength lies in its ability to extract clean, readable content from websites while preserving essential formatting like bold text, italics, links, and images. Unlike general-purpose scrapers, Papeer focuses on creating distraction-free reading experiences by automatically removing ads, navigation menus, and other web clutter using the go-readability library.

With 346 stars and 25 forks on GitHub as of March 2026, Papeer has carved out a niche among developers and digital reading enthusiasts who want to convert web articles, documentation, and entire websites into portable ebook formats.

Getting Started

Installation is straightforward with multiple options available. For Go developers, the simplest method is using go install:

go install github.com/lapwat/papeer@latest

Alternatively, you can download pre-compiled binaries from the GitHub releases page for your specific platform. The tool requires no additional dependencies for basic functionality.

For users who need MOBI format support (though Kindle now supports EPUB), you'll need to install kindlegen on Linux:

TMPDIR=$(mktemp -d -t papeer-XXXXX)
curl -L https://github.com/lapwat/papeer/releases/download/kindlegen/kindlegen_linux_2.6_i386_v2_9.tar.gz > $TMPDIR/kindlegen.tar.gz
tar xzvf $TMPDIR/kindlegen.tar.gz -C $TMPDIR
chmod +x $TMPDIR/kindlegen
sudo mv $TMPDIR/kindlegen /usr/local/bin
rm -rf $TMPDIR

Once installed, verify the installation by running:

papeer --help

Usage & Practical Examples

Papeer's command-line interface is built around two main commands: list for previewing content structure and get for actual scraping.

Single Page Scraping

The simplest use case involves scraping a single web page:

papeer get https://example.com/article

This creates a Markdown file with the cleaned content. To specify different output formats:

papeer get https://example.com/article --format=epub
papeer get https://example.com/article --format=html --output="my-article"

Website Documentation Scraping

One of Papeer's most powerful features is scraping entire documentation sites. First, use the list command to preview the table of contents:

papeer list https://12factor.net/ --selector='section.concrete>article>h2>a'

This displays a numbered list of all pages that would be scraped. Once satisfied with the structure, run the actual scraping:

papeer get https://12factor.net/ --selector='section.concrete>article>h2>a' --format=epub

The tool will create a complete ebook with all the documentation pages as chapters, complete with a table of contents.

Advanced Options

Papeer provides fine-grained control over the scraping process:

# Limit to first 10 chapters
papeer get https://docs.example.com --selector='nav a' --limit=10

# Skip first 5 chapters, reverse order
papeer get https://blog.example.com --selector='.post-link' --offset=5 --reverse

# Add delays between requests (respectful scraping)
papeer get https://example.com --selector='a.chapter' --delay=1000 --threads=2

Performance & Benchmarks

Built in Go, Papeer offers solid performance characteristics. The tool's concurrent downloading capability allows it to process multiple pages simultaneously, with the thread count configurable via the --threads option. In testing scenarios, Papeer can process documentation sites with dozens of pages in under a minute, depending on network conditions and target server response times.

Memory usage remains modest even when processing large websites, as the tool streams content rather than loading everything into memory simultaneously. The go-readability library adds minimal overhead while providing significant value in content cleaning.

The built-in delay mechanism helps maintain respectful scraping practices, preventing server overload while still maintaining reasonable processing speeds.

Who Should Use Papeer?

Papeer targets several specific user groups:

Digital Reading Enthusiasts: Users who prefer reading web content on e-readers will find Papeer invaluable for converting articles, documentation, and blog posts into comfortable reading formats.

Developers and Technical Writers: Those who need to convert technical documentation, API references, or tutorial series into offline-readable formats for reference or distribution.

Researchers and Students: Academic users who want to compile web-based research materials into organized, searchable ebook formats for easier study and annotation.

Content Curators: Individuals who aggregate content from multiple sources and need to present it in a unified, professional format.

The tool is less suitable for users who need graphical interfaces, real-time content monitoring, or integration with content management systems.

Verdict

Papeer excels in its specific niche of converting web content for e-reader consumption. Its intelligent content extraction, multiple format support, and recursive scraping capabilities make it a powerful tool for anyone serious about digital reading. While the command-line interface and CSS selector requirements create a learning curve, the results justify the effort for users who regularly consume web content on e-readers. The active development and GPL-3.0 license ensure long-term viability, making Papeer a solid choice for web-to-ebook conversion workflows.

Capabilities

Key Features

Multi-format Export: Convert websites to Markdown, HTML, EPUB, and MOBI formats
Intelligent Content Cleaning: Automatically removes ads, navigation, and clutter using go-readability
Recursive Website Scraping: Follow navigation menus and links to scrape entire websites
RSS Feed Processing: Convert RSS feeds into organized ebooks
CSS Selector Support: Target specific content areas with precise CSS selectors
Concurrent Downloads: Multi-threaded processing for faster large-site scraping
HTTP Proxy Mode: Function as a filtering proxy for real-time content processing
Progress Tracking: Visual progress bars for multi-page operations
Customizable Metadata: Set author, title, and other ebook properties
Cross-platform: Native support for Windows, macOS, and Linux

Setup

Installation

From Go (Recommended)

go install github.com/lapwat/papeer@latest

From Binary

Download the latest release for your platform:

# Linux/macOS
wget https://github.com/lapwat/papeer/releases/latest/download/papeer-linux-amd64
chmod +x papeer-linux-amd64
sudo mv papeer-linux-amd64 /usr/local/bin/papeer

MOBI Support (Linux only)

TMPDIR=$(mktemp -d -t papeer-XXXXX)
curl -L https://github.com/lapwat/papeer/releases/download/kindlegen/kindlegen_linux_2.6_i386_v2_9.tar.gz > $TMPDIR/kindlegen.tar.gz
tar xzvf $TMPDIR/kindlegen.tar.gz -C $TMPDIR
chmod +x $TMPDIR/kindlegen
sudo mv $TMPDIR/kindlegen /usr/local/bin
rm -rf $TMPDIR

How to Use

Usage Guide

Basic Single Page Scraping

# Scrape single page to Markdown
papeer get https://example.com/article

# Export to EPUB format
papeer get https://example.com/article --format=epub

# Custom output filename
papeer get https://example.com/article --output="my-article" --format=html

Website Documentation Scraping

# Preview table of contents first
papeer list https://12factor.net/ --selector='section.concrete>article>h2>a'

# Scrape entire documentation site
papeer get https://12factor.net/ --selector='section.concrete>article>h2>a' --format=epub

Advanced Options

# Limit chapters and add delays
papeer get https://docs.example.com --selector='nav a' --limit=10 --delay=1000

# Skip first chapters, reverse order
papeer get https://blog.example.com --selector='.post-link' --offset=5 --reverse

# Set custom metadata
papeer get https://example.com --author="John Doe" --name="My Book" --format=epub

Evaluation

Pros & Cons

Pros

Purpose-built for e-reader content with excellent format support
Intelligent content extraction removes ads automatically
Recursive scraping handles complex website structures
Cross-platform with simple installation process
Active development with recent updates
Respectful scraping with delay and concurrency controls
Preview mode for content structure verification

Cons

Command-line only interface
Requires CSS selector knowledge for complex scenarios
Limited documentation for advanced use cases
No built-in scheduling or automation features
MOBI support requires additional Linux setup
Smaller community than general-purpose tools

Other Options

Alternatives

Calibre

Comprehensive ebook management suite with web scraping through news sources, more complex but feature-rich

Learn More

Pandoc

Universal document converter that can handle web content but requires manual content extraction

Learn More

Scrapy

Python web scraping framework offering more flexibility but requiring significant development work

Learn More

Mercury Parser

Web content extraction API focused on article parsing but lacks ebook generation

Learn More

Frequently Asked Questions

Is Papeer free to use?+

Yes, Papeer is completely free and open source under the GPL-3.0 license. You can use, modify, and distribute it freely.

How does Papeer compare to Calibre for web scraping?+

Papeer is more focused and easier to use for web-to-ebook conversion, while Calibre offers broader ebook management features but requires more complex configuration for web scraping.

What e-reader formats does Papeer support?+

Papeer supports EPUB (recommended for modern e-readers), MOBI (for older Kindles), Markdown, HTML, and JSON formats. EPUB works on most current e-readers including Kindle.

Can I use Papeer to scrape any website?+

Papeer can scrape most websites, but effectiveness depends on the site's structure. It works best with content-focused sites and may struggle with heavily JavaScript-dependent pages.

How active is Papeer's development?+

Very active - the project was last updated in March 2026 with version 0.8.7 released in December 2025. The developer regularly maintains and improves the tool.

References

Official Resources (3)

GitHub RepositorySource code, issues, releases, and documentationhttps://github.com/lapwat/papeer

Official WebsiteProject homepage with overview and exampleshttps://papeer.tech

Latest ReleasesDownload pre-compiled binaries for all platformshttps://github.com/lapwat/papeer/releases

Links

Quick Links

View on GitHubhttps://github.com/lapwat/papeer

Visit Websitehttps://papeer.tech

Written by

Evan Mael

Microsoft MCSA-certified Cloud Architect | Fortinet-focused. I modernize cloud, hybrid & on-prem infrastructure for reliability, security, performance and cost control - sharing field-tested ops & troubleshooting.

Further Intelligence

Deepen your knowledge with related resources

Discussion

Share your thoughts and insights