ANAVEM
Languagefr
Terminal showing Papeer web scraping progress with ebook conversion
Open SourceOpen SourceGo

Papeer

Papeer is a command-line web scraper built in Go that converts websites and RSS feeds into clean, readable ebooks and markdown files. Designed specifically for e-readers, it strips away ads and navigation to preserve only the essential content.

Evan MaelEvan Mael
27 March 2026 12 min 346
346 Stars GoOpen Source 12 min
Introduction

Overview

What is Papeer?

Papeer is a specialized web scraping tool designed for the e-ink era, built by developer lapwat and first released in September 2021. Written in Go, this command-line utility addresses a specific need: converting web content into formats optimized for e-readers like Kindle, reMarkable tablets, and other digital reading devices.

The tool's primary strength lies in its ability to extract clean, readable content from websites while preserving essential formatting like bold text, italics, links, and images. Unlike general-purpose scrapers, Papeer focuses on creating distraction-free reading experiences by automatically removing ads, navigation menus, and other web clutter using the go-readability library.

With 346 stars and 25 forks on GitHub as of March 2026, Papeer has carved out a niche among developers and digital reading enthusiasts who want to convert web articles, documentation, and entire websites into portable ebook formats.

Getting Started

Installation is straightforward with multiple options available. For Go developers, the simplest method is using go install:

go install github.com/lapwat/papeer@latest

Alternatively, you can download pre-compiled binaries from the GitHub releases page for your specific platform. The tool requires no additional dependencies for basic functionality.

For users who need MOBI format support (though Kindle now supports EPUB), you'll need to install kindlegen on Linux:

TMPDIR=$(mktemp -d -t papeer-XXXXX)
curl -L https://github.com/lapwat/papeer/releases/download/kindlegen/kindlegen_linux_2.6_i386_v2_9.tar.gz > $TMPDIR/kindlegen.tar.gz
tar xzvf $TMPDIR/kindlegen.tar.gz -C $TMPDIR
chmod +x $TMPDIR/kindlegen
sudo mv $TMPDIR/kindlegen /usr/local/bin
rm -rf $TMPDIR

Once installed, verify the installation by running:

papeer --help

Usage & Practical Examples

Papeer's command-line interface is built around two main commands: list for previewing content structure and get for actual scraping.

Single Page Scraping

The simplest use case involves scraping a single web page:

papeer get https://example.com/article

This creates a Markdown file with the cleaned content. To specify different output formats:

papeer get https://example.com/article --format=epub
papeer get https://example.com/article --format=html --output="my-article"

Website Documentation Scraping

One of Papeer's most powerful features is scraping entire documentation sites. First, use the list command to preview the table of contents:

papeer list https://12factor.net/ --selector='section.concrete>article>h2>a'

This displays a numbered list of all pages that would be scraped. Once satisfied with the structure, run the actual scraping:

papeer get https://12factor.net/ --selector='section.concrete>article>h2>a' --format=epub

The tool will create a complete ebook with all the documentation pages as chapters, complete with a table of contents.

Advanced Options

Papeer provides fine-grained control over the scraping process:

# Limit to first 10 chapters
papeer get https://docs.example.com --selector='nav a' --limit=10

# Skip first 5 chapters, reverse order
papeer get https://blog.example.com --selector='.post-link' --offset=5 --reverse

# Add delays between requests (respectful scraping)
papeer get https://example.com --selector='a.chapter' --delay=1000 --threads=2

Performance & Benchmarks

Built in Go, Papeer offers solid performance characteristics. The tool's concurrent downloading capability allows it to process multiple pages simultaneously, with the thread count configurable via the --threads option. In testing scenarios, Papeer can process documentation sites with dozens of pages in under a minute, depending on network conditions and target server response times.

Memory usage remains modest even when processing large websites, as the tool streams content rather than loading everything into memory simultaneously. The go-readability library adds minimal overhead while providing significant value in content cleaning.

The built-in delay mechanism helps maintain respectful scraping practices, preventing server overload while still maintaining reasonable processing speeds.

Who Should Use Papeer?

Papeer targets several specific user groups:

Digital Reading Enthusiasts: Users who prefer reading web content on e-readers will find Papeer invaluable for converting articles, documentation, and blog posts into comfortable reading formats.

Developers and Technical Writers: Those who need to convert technical documentation, API references, or tutorial series into offline-readable formats for reference or distribution.

Researchers and Students: Academic users who want to compile web-based research materials into organized, searchable ebook formats for easier study and annotation.

Content Curators: Individuals who aggregate content from multiple sources and need to present it in a unified, professional format.

The tool is less suitable for users who need graphical interfaces, real-time content monitoring, or integration with content management systems.

Verdict

Papeer excels in its specific niche of converting web content for e-reader consumption. Its intelligent content extraction, multiple format support, and recursive scraping capabilities make it a powerful tool for anyone serious about digital reading. While the command-line interface and CSS selector requirements create a learning curve, the results justify the effort for users who regularly consume web content on e-readers. The active development and GPL-3.0 license ensure long-term viability, making Papeer a solid choice for web-to-ebook conversion workflows.

Capabilities

Key Features

  • Multi-format Export: Convert websites to Markdown, HTML, EPUB, and MOBI formats
  • Intelligent Content Cleaning: Automatically removes ads, navigation, and clutter using go-readability
  • Recursive Website Scraping: Follow navigation menus and links to scrape entire websites
  • RSS Feed Processing: Convert RSS feeds into organized ebooks
  • CSS Selector Support: Target specific content areas with precise CSS selectors
  • Concurrent Downloads: Multi-threaded processing for faster large-site scraping
  • HTTP Proxy Mode: Function as a filtering proxy for real-time content processing
  • Progress Tracking: Visual progress bars for multi-page operations
  • Customizable Metadata: Set author, title, and other ebook properties
  • Cross-platform: Native support for Windows, macOS, and Linux
Setup

Installation

From Go (Recommended)

go install github.com/lapwat/papeer@latest

From Binary

Download the latest release for your platform:

# Linux/macOS
wget https://github.com/lapwat/papeer/releases/latest/download/papeer-linux-amd64
chmod +x papeer-linux-amd64
sudo mv papeer-linux-amd64 /usr/local/bin/papeer

MOBI Support (Linux only)

TMPDIR=$(mktemp -d -t papeer-XXXXX)
curl -L https://github.com/lapwat/papeer/releases/download/kindlegen/kindlegen_linux_2.6_i386_v2_9.tar.gz > $TMPDIR/kindlegen.tar.gz
tar xzvf $TMPDIR/kindlegen.tar.gz -C $TMPDIR
chmod +x $TMPDIR/kindlegen
sudo mv $TMPDIR/kindlegen /usr/local/bin
rm -rf $TMPDIR
How to Use

Usage Guide

Basic Single Page Scraping

# Scrape single page to Markdown
papeer get https://example.com/article

# Export to EPUB format
papeer get https://example.com/article --format=epub

# Custom output filename
papeer get https://example.com/article --output="my-article" --format=html

Website Documentation Scraping

# Preview table of contents first
papeer list https://12factor.net/ --selector='section.concrete>article>h2>a'

# Scrape entire documentation site
papeer get https://12factor.net/ --selector='section.concrete>article>h2>a' --format=epub

Advanced Options

# Limit chapters and add delays
papeer get https://docs.example.com --selector='nav a' --limit=10 --delay=1000

# Skip first chapters, reverse order
papeer get https://blog.example.com --selector='.post-link' --offset=5 --reverse

# Set custom metadata
papeer get https://example.com --author="John Doe" --name="My Book" --format=epub
Evaluation

Pros & Cons

Pros
  • Purpose-built for e-reader content with excellent format support
  • Intelligent content extraction removes ads automatically
  • Recursive scraping handles complex website structures
  • Cross-platform with simple installation process
  • Active development with recent updates
  • Respectful scraping with delay and concurrency controls
  • Preview mode for content structure verification
Cons
  • Command-line only interface
  • Requires CSS selector knowledge for complex scenarios
  • Limited documentation for advanced use cases
  • No built-in scheduling or automation features
  • MOBI support requires additional Linux setup
  • Smaller community than general-purpose tools
Other Options

Alternatives

Calibre

Comprehensive ebook management suite with web scraping through news sources, more complex but feature-rich

Learn More

Pandoc

Universal document converter that can handle web content but requires manual content extraction

Learn More

Scrapy

Python web scraping framework offering more flexibility but requiring significant development work

Learn More

Mercury Parser

Web content extraction API focused on article parsing but lacks ebook generation

Learn More

Frequently Asked Questions

Is Papeer free to use?+
Yes, Papeer is completely free and open source under the GPL-3.0 license. You can use, modify, and distribute it freely.
How does Papeer compare to Calibre for web scraping?+
Papeer is more focused and easier to use for web-to-ebook conversion, while Calibre offers broader ebook management features but requires more complex configuration for web scraping.
What e-reader formats does Papeer support?+
Papeer supports EPUB (recommended for modern e-readers), MOBI (for older Kindles), Markdown, HTML, and JSON formats. EPUB works on most current e-readers including Kindle.
Can I use Papeer to scrape any website?+
Papeer can scrape most websites, but effectiveness depends on the site's structure. It works best with content-focused sites and may struggle with heavily JavaScript-dependent pages.
How active is Papeer's development?+
Very active - the project was last updated in March 2026 with version 0.8.7 released in December 2025. The developer regularly maintains and improves the tool.
References

Official Resources (3)

Evan Mael
Written by

Evan Mael

Microsoft MCSA-certified Cloud Architect | Fortinet-focused. I modernize cloud, hybrid & on-prem infrastructure for reliability, security, performance and cost control - sharing field-tested ops & troubleshooting.

Further Intelligence

Deepen your knowledge with related resources

Discussion

Share your thoughts and insights

Sign in to join the discussion