The help info of pyglossary returned by the command pyglossary --help
is not fully documented all features of pyglossary
. Many useful options are not mentioned there.
I did git clone the pyglossary project from Github, and used ripgrep command rg (rg is similar to grep command, but faster) to search with some keywords and read the source code a little bit.
Here are some useful options when processing Stardict data.
1. Disable dictzip
By default, as of PyGlossary 4.0.11 (run pyglossary --version
), it will auto compress the .dict
file to dict.dz
# See the code file: pyglossary/pyglossary/plugins/stardict.py
...
class Writer(object):
_dictzip: bool = True
...
Just in case your dictionary app can not read the .dz
file, you can disable this with this option (I had to read the source code file above to know this option)
--write-options=dictzip=False
2. Passing multi options
Use quoted string and ;
for example, use HTML sequence and disable dictzip:
--write-options='sametypesequence=h;dictzip=False'
pyglossary pi-en-PTSPED-2021-CSS PTSPED-2021-CSS.ifo -v5 --read-format=Tabfile --write-format=Stardict --write-options='sametypesequence=h;dictzip=False'
3. Lang code for dictionary
Open the file pyglossary/pyglossary/langs/langs.json
and find the lang code for a particular language.
{
"codes": ["pi", "pli"],
"name": "Pali",
"alt_names": ["Magadhan"],
"script": ["Latin"],
"wiki": "https://en.wikipedia.org/wiki/Pali",
"title_tag": "b"
}
Then prepend the lang code pair to your tabfile, for example with pali-english dictionary (pi-en): pi-en-myTabfile
. pyglossary will understand and write this info to the output files.
4. Show more options
The help info provided by the command:
pyglossary --help
is not showing all options and some values mentioned there are not updated.
➜ TEST pyglossary --help PyGlossary is a tool for working with dictionary databases (glossaries) Basic Usage:
PyGI (Gtk3) Interface:
To open PyGlossary window:
pyglossary
PyGI is the default interface (so you never need to use "--ui=gtk" or --gtk option)
If PyGI was not found (not installed), then PyGlossary will fallback to Tkinter interface.
Tkinter Interface:
To open PyGlossary window:
pyglossary --tk
Or
pyglossary --ui=tk
Usually good for Windows and Mac OS X
Command-line interface:
To show this help:
pyglossary --help
To show program version:
pyglossary --version
To Convert:
pyglossary INPUT_FILE OUTPUT_FILE
To Reverse:
pyglossary INPUT_FILE OUTPUT_FILE.txt --reverse
Input and output formats will be detected from extensions if possible.
If not, you need to specify input or output format, for example:
pyglossary test.utf8 test.ifo --read-format=tabfile
pyglossary test.utf8 test.ifo --read-format tabfile
pyglossary test.ifo test.utf8 --write-format=tabfile
pyglossary test.ifo test.utf8 --write-format tabfile
Interactive command-line interface:
Minimal command:
pyglossary --cmd
Or
pyglossary --ui=cmd
Additionally you can pass any flag to act to act as default
General Options:
Verbosity:
-v0 or '--verbosity 0' for critical errors only
-v1 or '--verbosity 1' for errors only
-v2 or '--verbosity 2' for errors and warnings
-v3 or '--verbosity 3' for errors, warnings and info
-v4 or '--verbosity 4' for debug mode
-v5 or '--verbosity 5' for trace mode
Appearance:
--no-progress-bar and --no-color, useful for scripts
Full Convert Usage:
pyglossary INPUT_FILE OUTPUT_FILE [-vN] [--read-format=FORMAT] [--write-format=FORMAT]
[--sort|--no-sort] [--direct|--indirect] [--no-alts] [--sort-cache-size=2000] [--utf8-check|--no-utf8-check]
[--lower|--no-lower] [--read-options=READ_OPTIONS] [--write-options=WRITE_OPTIONS]
Command line arguments and options (and arguments for options) is parsed with GNU getopt method
You can also just type extension of output file instead of full path, if you want to create with the same input
file name with another extension. For example:
pyglossary mydic.ifo txt
instead of:
pyglossary mydic.ifo mydic.txt
Compressing with gz, bz2 and zip is supported, just append these extension to the file name, for example:
pyglossary mydic.ifo mydic.txt.gz
or
pyglossary mydic.ifo txt.gz
And if the input file has these extensions (gz, bz2, zip), it will be extracted before loading
Supported input formats:
Name | Description | Extensions
----------------+-------------------------------+------------------
Aard2Slob | Aard 2 (.slob) | .slob
ABCMedicalNotes | ABC Medical Notes (SQLite3) |
Almaany | Almaany.com (SQLite3) |
AppleDictBin | AppleDict Binary | .dictionary .data
BabylonBgl | Babylon (.BGL) | .bgl
CC-CEDICT | CC-CEDICT | .u8
cc-kedict | cc-kedict |
CrawlerDir | Crawler Directory | .crawler
Csv | CSV (.csv) | .csv
Dicformids | DictionaryForMIDs | .mids
Dictcc | Dict.cc (SQLite3) |
Dictcc_split | Dict.cc (SQLite3) - Split |
DictOrg | DICT.org file format (.index) | .index
Dictunformat | dictunformat output file | .dictunformat
DigitalNK | DigitalNK (SQLite3, N-Korean) |
ABBYYLingvoDSL | ABBYY Lingvo DSL (.dsl) | .dsl
Dictfile | Kobo E-Reader Dictfile (.df) | .df
Edlin | EDLIN | .edlin
FreeDict | FreeDict (.tei) | .tei
GettextPo | Gettext Source (.po) | .po
Info | Glossary Info (.info) | .info
JMDict | JMDict |
LingoesLDF | Lingoes Source (.ldf) | .ldf
OctopusMdict | Octopus MDict (.mdx) | .mdx
Sdict | Sdictionary Binary(dct) | .dct
Stardict | StarDict (.ifo) | .ifo
Tabfile | Tabfile (.txt, .dic) | .txt .tab .tsv
WiktionaryDump | Wiktionary Dump (.xml) | .xml
Wordset | Wordset.org JSON directory |
Xdxf | XDXF (.xdxf) | .xdxf
Zim | Zim (.zim, for Kiwix) | .zim
Supported output formats:
Name | Description | Extensions
--------------+-------------------------------+---------------
Aard2Slob | Aard 2 (.slob) | .slob
AppleDict | AppleDict Source | .apple
CrawlerDir | Crawler Directory | .crawler
Csv | CSV (.csv) | .csv
Dicformids | DictionaryForMIDs | .mids
DictOrg | DICT.org file format (.index) | .index
DictOrgSource | DICT.org dictfmt source file | .dtxt
Epub2 | EPUB-2 E-Book | .epub
Kobo | Kobo E-Reader Dictionary | .kobo
Dictfile | Kobo E-Reader Dictfile (.df) | .df
Mobi | MOBI E-Book | .mobi
Edlin | EDLIN | .edlin
FreeDict | FreeDict (.tei) | .tei
GettextPo | Gettext Source (.po) | .po
HtmlDir | HTML Directory | .hdir
Info | Glossary Info (.info) | .info
Json | JSON (.json) | .json
LingoesLDF | Lingoes Source (.ldf) | .ldf
SdictSource | Sdictionary Source (.sdct) | .sdct
Sql | SQL (.sql) | .sql
Stardict | StarDict (.ifo) | .ifo
Tabfile | Tabfile (.txt, .dic) | .txt .tab .tsv
The trick is simply providing an unknown option, pyglossary will show all available options.
For example, we append an unknown option --helloworld
➜ TEST pyglossary PTSPED-2021-CSS PTSPED-2021-CSS.ifo -v5 --read-format=Tabfile --write-format=Stardict --helloworld
usage: pyglossary [-v {0,1,2,3,4,5}]
[--version] [-h]
[-u {cmd,gtk,tk,auto,none}]
[--cmd] [--gtk] [--tk]
[--interactive]
[--no-interactive]
[-r READOPTIONS]
[-w WRITEOPTIONS]
[--json-read-options JSONREADOPTIONS]
[--json-write-options JSONWRITEOPTIONS]
[--read-format INPUTFORMAT] [--write-format OUTPUTFORMAT]
[--direct] [--indirect]
[--no-progress-bar]
[--no-color] [--sort]
[--no-sort]
[--sort-cache-size SORTCACHESIZE]
[--reverse] [--log-time]
[--no-log-time]
[--cleanup] [--no-cleanup]
[--lower] [--no-lower]
[--utf8-check]
[--no-utf8-check]
[--no-alts]
[--skip-resources] [--rtl]
[--remove-html REMOVE_HTML]
[--remove-html-all]
[--normalize-html]
[--info]
[inputFilename]
[outputFilename]
pyglossary: error: unrecognized arguments: --helloworld
➜ TEST