Poppler is a command-line program/utility that can be used to perform a variety of PDF manipulations. Thanks to Termux, we can install and use poppler right on our phones.
# update Termux packages:
pkg update
# install poppler:
pkg install poppler
On Termux, after installing the poppler package, just type pdf and then tap the tab button, it will show all available utilities started with pdf.
Keyboard tips: One of the best keyboards to work with Termux on Android is Hacker’s Keyboard (Google Play store install link).
If your device doesn’t have the Google Play store, you can find & download Hacker’s Keyboard latest APK built from https://f-droid.org.
It is open source and free, you can also build your own APK.
Here we will try these commands: pdftotext, pdftohtml and pdfimages
1. pdftotext and pdftohtml
# to see available options, run:
pdftotext --help
pdftohtml --help
Usage: pdftotext [options] <PDF-file> [<text-file>]
Usage: pdftohtml [options] <PDF-file> [<html-file> <xml-file>]
To convert a pdf file to a plain text or an html file, let’s say the file is file.pdf located in your phone Download folder:
# cd to the Download dir:
cd ~/storage/shared/Download/
# convert file.pdf to txt:
pdftotext file.pdf file.txt
# convert file.pdf to html:
pdftohtml file.pdf file.html
One helpful parameter is -layout which will try to maintain the original physical layout of the file. -f is the first page to convert.
pdftotext -f 48 -layout file.pdf
2. pdfimages
This utility will help to extract all images from a pdf file.
Usage: pdfimages [options] <PDF-file> <image-root>
# to see available options, run:
pdfimages --help
For example, we will perform these tasks:
- extract all images in a PDF file, for example,
file_illustrated.pdf
- change the default output format to PNG
- prepend a prefix named pnote to output file names
- add page numbers to9 output file names
- save them to a new folder named imgpdf
One line command:
mkdir -p imgpdf && pdfimages -png -p file_illustrated.pdf ./imgpdf/pnote
Note: after pnote there must be no ‘/’, otherwise it will become a dir path for output images.
Here are the explanations:
mkdir -p imgpdf: to make a new dir named imgpdf
&& : so that we can have 2 commands in one line
pdfimages: the main command
# see more option with pdfimages --help
-png: change the default output format to PNG
-p : include page numbers in output file names
file_illustrated.pdf : pdf file
./imgpdf/ : output dir
pnote: a prefix to output file name
Tips:
The above command pdfimages does extract all images from a PDF file.
If you want to convert PDF pages to images, 1 page to 1 image or all pages to 1 image, you can use imagemagick, also available on Termux. See this related post.
3. Other commands/utilities
You can simply run each command with --help
option to view its instructions.
pdfattach
pdfinfo
pdftoppm
pdfdetach
pdfseparate
pdftops
pdffonts
pdftocairo
pdfunite
PS: On iOS or iPadOS, to convert a PDF file to text, you can simply use the Apple Shortcut app.