Using Apps
- Download offline speech recognition language for Speech to Text
Mozilla DeepSpeech

There are two main choices here. Using ready to use Android apps or Mozilla DeepSpeech library.

Mozilla DeepSpeech is more suitable to transcribe hundreds of wav files to text automatically and programmatically.

Using Apps

To use the below free apps, your Android phone must have Google App and Speech Services by Google installed since they use Google offline Speech to Text and Text to Speech under the hook.

Download offline speech recognition language for Speech to Text

Depending on your device OS, the paths to Settings for Google apps below can be a little bit different. Go there and download offline languages.

Open your Google app, on the top right corner, click on your username icon and go to Settings > Voices > Offline Speech Recognition

If you do not see Offline Speech Recognition there, try this:

Phone Settings > Google > Settings for Google apps > Search, Assistant, Voice > Voice > Offline Speech Recognition

Tips: if it doesn’t start to download, try disabling Airplane mode ✈️.

Here are some free apps that worked for me at the time of this writing.

Mozilla DeepSpeech

Use open-source Mozilla DeepSpeech. We can manage to run it offline on Termux.

Currently, Termux is not powerful enough to run the model for desktop deepspeech-*-models.pbmm, we need to use the .tflite model.

On Termux:

cd ~/
mkdir -p s2t
cd s2t
# update link and version (0.9.3)
wget -c https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3//deepspeech-0.9.3-models.tflite

wget -c https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3//deepspeech-0.9.3-models.scorer

# change arm64 if you have a different one

wget -c https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/native_client.arm64.cpu.android.tar.xz

# Unzip
tar xf native_client.arm64.cpu.android.tar.xz

So the s2t folder tree will be something like this:

➜  s2t tree
.
├── GRAPH_VERSION -> training/deepspeech_training/GRAPH_VERSION
├── LICENSE
├── README.mozilla
├── VERSION -> training/deepspeech_training/VERSION
├── deepspeech
├── deepspeech-0.9.3-models.pbmm
├── deepspeech-0.9.3-models.scorer
├── deepspeech-0.9.3-models.tflite
├── deepspeech.h
├── generate_scorer_package
├── libc++_shared.so
├── libdeepspeech.so
├── native_client.arm64.cpu.android.tar.xz
└── s2t.sh

To run it, we have to temporarily export LD_LIBRARY_PATH=~/s2t

This export will cause other programs to fail to work. But don’t worry, simply close and restart the Terminal or Termux to remove this temporarily export.

Below is a one-line command to get the text from file.wav.

export LD_LIBRARY_PATH=~/s2t/ && ~/s2t/deepspeech --model ~/s2t/deepspeech-0.9.3-models.tflite --scorer ~/s2t/deepspeech-0.9.3-models.scorer --audio file.wav

It should be noted that the audio length should be around 8 seconds only. This is a current drawback of DeepSpeech. You can use ffmpeg to segment the audio file.

Offline Speech to Text on Android

Using Apps

Download offline speech recognition language for Speech to Text

Mozilla DeepSpeech