Core API¶
The core package holds the voice-control engine: wake-word detection,
transcription, command routing, speech output, the panel indicator, and the
GNOME Shell extension lifecycle. Plugins receive an EasySpeak
instance and use its small public API (speak, host_run, transcribe,
wait_for_speech, record_until_silence, deactivate, ...) to act on commands.
core.main¶
core.main
¶
EasySpeak Core - Voice Control for Linux.
Loads plugins from plugins/ folder automatically. Uses pyopen-wakeword for fast wake detection.
EasySpeak
¶
The voice-control daemon: wake detection, transcription, and routing.
Owns the audio pipeline (wake word -> Whisper), the loaded plugins, the
text-to-speech pipeline, and the panel indicator, and exposes the small
plugin-facing API (speak,
host_run,
transcribe, ...) that plugins use to act on
commands.
Initialise daemon state; models and audio are loaded later in run().
Source code in src/core/main.py
host_run
¶
Run a shell command.
With clean_env, EasySpeak's injected LD_LIBRARY_PATH and GI_TYPELIB_PATH are stripped from the child's environment. The dev flake prepends its own libraries (glib among them) to those paths for EasySpeak's own dependencies; left in place they leak into spawned desktop apps, which then load EasySpeak's flake-pinned libraries instead of their own rpath ones. A glib/GIO build mismatch there wedges GTK apps such as the file manager. Pass clean_env=True when launching external GUI programs so they run in the plain host environment.
Source code in src/core/main.py
speak
¶
tap_key
¶
Replay a multimedia key so the desktop renders its native feedback.
Returns True if the key was injected, False if unavailable (e.g. a non-GNOME session) so the caller can fall back to a silent change. jeepney is imported lazily so the dependency isn't needed to load.
Source code in src/core/main.py
deactivate
¶
Request the assistant go to sleep (plugin-facing).
Releases the mic and stops wake detection until reactivated from the tray. The actual release happens at the next main-loop iteration (handled by the tray controller) so the triggering command can finish, and speak, first.
Source code in src/core/main.py
register_push_to_talk
¶
Register the dictation session the hotkey runs while its combo is held.
Plugin-facing: the dictation plugin registers here in its setup() so core
can drive keyboard (silent) activation without importing a plugin
directly. handler takes one should_continue predicate and runs
until it returns False (the keys are released).
Source code in src/core/main.py
load_plugins
¶
Discover and import every plugin module from the plugins/ dir.
Files are loaded in sorted order (numeric prefixes set load order); names
starting with _ are skipped. A module is registered only if it exposes NAME
and handle; its optional setup hook runs once. Import or setup failures are
logged and skipped, never fatal.
Source code in src/core/main.py
get_all_commands
¶
Get all commands from all plugins for help text.
route_command
¶
Route command to appropriate plugin.
Returns False to exit.
Source code in src/core/main.py
flush_stream
¶
Flush any remaining audio data from the stream buffer.
Source code in src/core/main.py
is_silence
¶
record_until_silence
¶
Record mic audio until a short silence, capped at five seconds.
Returns the captured PCM bytes. Plugin-facing. should_continue (used by
push-to-talk) lets a key release cut the recording short instead of waiting out
the silence window.
Source code in src/core/main.py
wait_for_speech
¶
Block until speech is heard, returning its first PCM chunk.
Returns None if nothing is heard within timeout seconds. Plugin-facing.
should_continue (used by push-to-talk) returns None early once it goes False,
so a key release ends the wait.
Source code in src/core/main.py
transcribe
¶
Transcribe raw PCM audio to text with Whisper.
prompt biases recognition (defaults to the command vocabulary). Plugin-facing.
Source code in src/core/main.py
run
¶
Load models and plugins, then run the wake-word listen loop forever.
Blocks until the user quits (voice command, tray, or Ctrl-C), always releasing the microphone and draining speech on the way out.
Source code in src/core/main.py
483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 | |
core.cli¶
core.cli
¶
Command-line entry point: parse arguments, set up logging, run the app.
Verbosity comes from the mutually exclusive -v/--verbose and -q/--quiet
flags, or — with neither — the EASYSPEAK_LOG_LEVEL environment variable, which
the flags override (see resolve_level). The one-shot
--configure and --preview subcommands set up or print the desktop-integration
files and exit. All EASYSPEAK_* variables are listed in core.config.
parse_args
¶
Parse EasySpeak's command-line arguments.
Source code in src/core/cli.py
run
¶
Start the application, or run a one-shot subcommand and exit.
EasySpeak is imported lazily so the one-shot subcommands work without the
audio stack (pyopen-wakeword, pyaudio, …) that core.main pulls in.
Source code in src/core/cli.py
core.config¶
core.config
¶
Central tuning constants and the speech-model factory for EasySpeak.
Holds the wake-word, audio, silent-hotkey, speech-model, and desktop-sound
settings the daemon reads at import, alongside load_whisper_model(), which
builds the faster-whisper model from them. Most are plain constants; these
honor an EASYSPEAK_* environment variable:
EASYSPEAK_HOTKEYEASYSPEAK_OFFLINEEASYSPEAK_PIPER_BINEASYSPEAK_PIPER_MODELEASYSPEAK_SOUNDS_DIREASYSPEAK_WHISPER_COMPUTE_TYPEEASYSPEAK_WHISPER_CPU_THREADSEASYSPEAK_WHISPER_MODEL
See the Configuration guide for what each does, its
default, and EasySpeak's other EASYSPEAK_* variables.
A plugin that needs host-environment setup does it in its own setup() hook —
see Writing Plugins.
load_whisper_model
¶
load_whisper_model(model_name=WHISPER_MODEL, compute_type=WHISPER_COMPUTE_TYPE, cpu_threads=WHISPER_CPU_THREADS)
Build a faster-whisper model from the configured (or given) settings.
A model already on disk — bundled in a language pack or cached from an earlier
run — loads without any network access. When it is missing, EasySpeak stays
offline by default and raises an actionable message; setting
EASYSPEAK_OFFLINE=relaxed (see NETWORK_ALLOWED) lets it fetch a bare name
like base.en from Hugging Face instead.
Source code in src/core/config.py
core.log¶
core.log
¶
Logging setup for EasySpeak's terminal output.
Output is message-only (no level or timestamp prefixes) so the CLI reads nicely for
humans; the level only gates which lines appear. Only the easyspeak and plugins
logger hierarchies are configured, so importing the package as a library leaves the root
logger untouched.
resolve_level
¶
Pick the log level: CLI flags win, then EASYSPEAK_LOG_LEVEL, else INFO.
Source code in src/core/log.py
configure
¶
Attach a message-only stderr handler to the package loggers.
Source code in src/core/log.py
core.speech¶
core.speech
¶
Text-to-speech playback pipeline and audio-device noise suppression.
Extracted from core.main so the orchestration loop isn't tangled up with the subprocess
plumbing that keeps a piper voice model warm. SpeechPipeline owns a persistent piper
-> player pair; suppressed_c_stderr hides the unrelated ALSA/JACK probe spew
PortAudio emits when the input side (PyAudio) starts up.
SpeechPipeline
¶
A persistent piper -> audio player pipeline for low-latency speech.
Keeping piper alive avoids reloading its voice model (~2s) on every phrase, and streaming raw PCM straight to the player means playback starts as soon as synthesis does instead of after a temp WAV is fully written.
Create an idle pipeline; the subprocesses are spawned on first use.
Source code in src/core/speech.py
ensure
¶
Spawn the persistent piper -> player pipeline if not already running.
Raises OSError if the pipeline can't be brought up (missing binary, bad model), so callers can warn once rather than silently failing per phrase.
Source code in src/core/speech.py
speak
¶
Text-to-speech output.
Non-blocking: the phrase is handed to a warm piper process that streams audio to the player, so this returns immediately and the speech plays concurrently with whatever the caller does next.
Source code in src/core/speech.py
drain
¶
Flush pending speech, wait for playback, then stop the pipeline.
Used on shutdown so a final phrase (e.g. "Goodbye.") is heard in full before the process exits.
Source code in src/core/speech.py
suppressed_c_stderr
¶
Silence the device-probe spew PortAudio dumps when PyAudio starts up.
Initializing PyAudio makes PortAudio enumerate every ALSA PCM named in the system alsa.conf (surround*, hdmi, iec958, ...) and ping JACK. Devices the machine doesn't have each print a harmless error, and libasound/libjack write them straight to file descriptor 2 from C — Python's sys.stderr never sees them, so only redirecting the fd itself can hide them.
Suppression is best-effort: if stderr has no real fd (e.g. captured in tests), the fd can't be snapshotted (e.g. fd exhaustion), or the fd swap itself fails (e.g. fd 2 closed/invalid), there's nothing safe to redirect, so pass through rather than abort the caller.
Source code in src/core/speech.py
core.tray¶
core.tray
¶
GNOME panel-indicator bridge and asleep-state lifecycle for the daemon.
The daemon is voice-first and otherwise headless; this module gives it a
top-panel microphone icon (served by the bundled gnome@easyspeak.dev GNOME
Shell extension) and owns everything about it so the audio loop in core.main
stays about audio. It runs without a D-Bus server of its own via two
one-directional channels:
- daemon -> icon: state is pushed for display via a one-shot
gdbus call(the same path the mouse-grid plugin uses to drive the extension). - icon -> daemon: the extension's menu writes a single command to a control file; the controller consumes it. The audio loop already wakes every ~80ms on a read, so a cheap file probe per iteration is enough — no GLib loop needed.
The indicator is shown only while the assistant is asleep (deactivated); while running it stays hidden because GNOME's own microphone privacy icon already signals the open mic.
TrayAction
¶
Bases: Enum
flowchart TD
core.tray.TrayAction[TrayAction]
click core.tray.TrayAction href "" "core.tray.TrayAction"
What Tray.poll is telling the audio loop to do next.
Tray
¶
Owns the EasySpeak panel indicator and the asleep (deactivated) lifecycle.
The audio loop calls poll once per iteration; the
controller pushes display state, consumes menu commands, and — when deactivated —
runs the idle loop itself, using caller-supplied callbacks to release and reacquire
the microphone. It never touches audio directly.
Best-effort throughout: on a non-GNOME desktop (no gdbus/extension) the state
pushes simply fail and the daemon carries on, mirroring how the mouse-grid plugin
tolerates a missing extension.
Set up the controller, optionally with a spoken-feedback callback.
speak (core.speak) voices the asleep/awake lifecycle: the deactivation
confirmation for a button mute, the reactivation and startup greetings,
and the explanation when sleep can't engage. A voice "go to sleep" is
already spoken by the sleep plugin, hence the announce flag on
sleep. Defaults to a no-op so the tray works
headless and in tests.
Source code in src/core/tray.py
started
¶
Daemon is up and listening; ensure the indicator is hidden.
Also drop any command left in the control file from before startup. It's a one-shot channel for live menu clicks, so a stale 'about'/'help'/ 'quit' written during a previous session (or before the daemon was running to consume it) must not fire the moment we begin polling — which otherwise pops the About window open on launch.
Greets the user once startup reaches a listening state, the spoken counterpart of the wake chime — and the same greeting reactivation gives.
Source code in src/core/tray.py
stopped
¶
request_sleep
¶
Queue a deactivate (e.g. the "go to sleep" voice command).
The mic is released at the audio loop's next poll,
after the current command finishes.
poll
¶
Act on any pending menu command or queued sleep request.
release_mic / acquire_mic are zero-arg callbacks that close and reopen the
input stream; the controller calls them around the asleep idle loop so this
module stays out of the audio internals. Returns a
TrayAction for the audio loop to dispatch on.
Source code in src/core/tray.py
sleep
¶
Release the mic and idle until reactivated, then greet and resume.
Pushes the muted state first and refuses to sleep unless it lands:
reactivation arrives via the extension (tray menu or Quick Settings
toggle), so releasing the mic while the indicator is missing (no
GNOME/extension) would strand the daemon with no way back. While asleep
the muted state is re-asserted every MUTED_REPUSH_INTERVAL seconds so
the icon recovers if GNOME Shell or the extension reloads (it would
otherwise come back hidden).
announce says whether to speak the deactivation confirmation: True for
a button mute, False for a voice "go to sleep" (the sleep plugin already
spoke, so speaking again would just repeat it). Reactivation always
greets, matching the startup greeting.
Freeing the stream also clears GNOME's privacy microphone indicator, an OS-level 'not listening' cue alongside our own muted glyph.
Source code in src/core/tray.py
set_state
¶
Push the current display state to the panel indicator (best-effort).
Source code in src/core/tray.py
take_command
¶
Return and clear the latest menu command, or None if none is pending.
Reads the one-shot control file the extension writes, then deletes it so each menu click fires exactly once. A missing file is the common case (every idle iteration), so it's checked cheaply first. The delete is best-effort and kept separate from the read: a command that was read successfully is returned even if the unlink fails (e.g. the file vanished in a race), rather than being silently dropped.
Source code in src/core/tray.py
core.hotkey¶
core.hotkey
¶
Hold-to-dictate keyboard activation via evdev (the silent-activation path).
On Wayland an ordinary process can't grab global keys, and only raw evdev events from
/dev/input expose key release — which "hold the keys to dictate, release to stop"
needs. This listener reads every keyboard device in a background thread and tracks
whether a configured modifier combo (default Ctrl+Shift) is fully held, so the audio
loop can start dictation on press and end it the moment the keys come up.
It is best-effort: if python-evdev isn't installed or /dev/input isn't readable (the
user isn't in the input group), the listener logs how to enable it and stays inert, so
the daemon runs normally without the hotkey.
HotkeyListener
¶
Track whether a key combo is held, off a background evdev reader.
The audio loop calls take_activation
once per iteration to learn when the combo was just pressed, then drives a dictation
session gated on is_held so releasing the
keys ends it. All evdev/threading state is kept behind a lock; the event-processing
logic in _process is pure and the I/O lives in _grab_devices/_drain, so the
behaviour is unit-testable without a real keyboard.
Set up the listener for combo (e.g. "ctrl+shift").
An empty, unparseable, or unknown-key combo, or enabled=False, leaves the
listener disabled so start is a no-op.
Source code in src/core/hotkey.py
is_held
¶
take_activation
¶
Return True once per press of the combo, clearing the edge.
The audio loop polls this each iteration; a True means "the user just pressed the combo — start dictation now".
Source code in src/core/hotkey.py
start
¶
Begin watching the keyboard, unless disabled or no device is readable.
Logs and stays inert (no thread) when the feature is turned off, evdev is
missing, or /dev/input can't be read, so the daemon runs normally either way.
Source code in src/core/hotkey.py
stop
¶
Stop the reader thread, then release the keyboard devices.
Joins the thread first so it leaves its select() loop before the fds close
under it — closing a fd mid-select can raise in the thread. The loop wakes
within one POLL_TIMEOUT, so the join returns promptly.
Source code in src/core/hotkey.py
parse_combo
¶
Parse a '+'-separated combo string into groups of accepted key names.
Each element becomes a group of evdev key names of which any one satisfies
that part of the combo, so "ctrl+shift" matches either Ctrl together
with either Shift. Names that aren't known aliases fall through as a single
literal evdev key name ("rightctrl" -> "KEY_RIGHTCTRL") so less
common combos still work.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
spec
|
str
|
A combo such as |
required |
Returns:
| Type | Description |
|---|---|
frozenset[str]
|
A tuple of frozensets, one per combo element; the combo is held when |
...
|
every group has at least one of its keys down. |
Source code in src/core/hotkey.py
unknown_keys
¶
Return combo tokens that are neither a modifier alias nor a real evdev key.
Validates against evdev.ecodes, so a typo (ctrl+spcae) is caught instead of
silently becoming a dead key. Returns [] when python-evdev can't be imported,
since the listener is inert without it anyway. Only evdev key names are checked;
the portal-based activation in issue #92 would use a different trigger grammar.
Source code in src/core/hotkey.py
core.mediakeys¶
core.mediakeys
¶
Replay multimedia keys through GNOME (Mutter) for native desktop feedback.
This lets the desktop render its own volume OSD and chime rather than imitating them. Pressing a volume key does not run any command: gnome-settings-daemon grabs the raw evdev key and handles the volume change, OSD and chime itself. The only way to reproduce that exactly is to replay the key. We inject it through Mutter's RemoteDesktop interface, which needs no special privileges. A RemoteDesktop session lives only as long as the D-Bus connection that created it, so the whole CreateSession -> Start -> NotifyKeyboardKeycode -> Stop sequence runs on one connection.
tap_key
¶
Press and release one evdev keycode via Mutter RemoteDesktop.
Raises if RemoteDesktop is unavailable (e.g. a non-GNOME session) so the caller can fall back to a silent change.
Source code in src/core/mediakeys.py
core.gnome_extension¶
core.gnome_extension
¶
Install, refresh, and enable the bundled GNOME Shell extension.
Core owns the extension's lifecycle because both the mousegrid plugin and core.tray
drive it over D-Bus; ensure_extension runs
once at startup. The extension ships as package data — extension.js, the
extension-helpers.js it imports, and metadata.json — copied into the user's
extensions dir as a set. On Wayland GNOME only loads extension code at login, so the
startup copy is always one login behind; a oneshot systemd user unit
ordered before the shell re-copies it at the start of every login to close that gap. The
unit runs as the user, writes only to $HOME, and needs no privileges.
RefreshResult
¶
Bases: Enum
flowchart TD
core.gnome_extension.RefreshResult[RefreshResult]
click core.gnome_extension.RefreshResult href "" "core.gnome_extension.RefreshResult"
Outcome of a single refresh attempt.
extension_source_dir
¶
Return the directory holding the bundled extension assets.
The extension's sources live in the src/gnome@easyspeak.dev/ folder,
shipped in the wheel as the easyspeak.gnome package. Resolving by package
name works in both editable and wheel installs, where that folder sits in
different places relative to this module.
Source code in src/core/gnome_extension.py
extensions_root
¶
extension_dest_dir
¶
refresh_extension_files
¶
Copy the bundled assets into dest_dir when they differ from it.
Best-effort: returns REFRESHED if written, UNCHANGED if already
current, or ERROR (noted on stderr) when a source is missing or a copy
fails. Assets are staged then moved into place (extension.js last), so a
failure leaves any working install untouched and never installs extension.js
without the helper it imports.
Source code in src/core/gnome_extension.py
refresh_installed_extension
¶
Refresh the installed extension from the bundled copy.
Run by the systemd user unit at each login.
unit_path
¶
unit_text
¶
Render the systemd user unit, baking in the interpreter and module path.
The unit body lives in a template shipped as package data (easyspeak.data)
so it reads as a service file, not a Python string. A missing template is a
broken install, so let the read error surface rather than masking it.
Two non-obvious choices in the rendered unit: it's ordered before
gnome-session-pre.target and the org.gnome.Shell@* services (missing ones
are ignored) so a changed extension takes effect on this login, not the next
— on Wayland the shell only loads extensions at login. And both ExecStart
arguments are quoted, since systemd splits ExecStart on whitespace.
Source code in src/core/gnome_extension.py
install_refresh_unit
¶
Install and enable the pre-shell refresh unit, idempotently.
Returns a short status string, or None when nothing changed or it isn't applicable (no systemd, a packaged system unit already present, write/enable failure).
Source code in src/core/gnome_extension.py
migrate_legacy_extensions
¶
Remove any pre-rename extension install so it can't double-load.
The extension's UUID changed over time — easyspeak-grid@local (when it was
just a mouse grid), then easyspeak@local, now gnome@easyspeak.dev. A
leftover copy under an old UUID would stay enabled and add a second panel
indicator and grid that the daemon no longer drives, so clean each up:
disable it (so GNOME drops it from the enabled set) then delete its
directory. Best-effort throughout — returns True if any legacy install was
found and removed.
One-off migration shim: once users have had a release or two to upgrade past the old UUIDs this can be removed (call site and tests included).
Source code in src/core/gnome_extension.py
ensure_extension
¶
Startup hook: install the refresh service and activate the extension.
Composes the two independently-runnable activities core.desktop_integration
also exposes. Skipped on non-GNOME desktops.
Source code in src/core/gnome_extension.py
activate_extension
¶
Install, refresh, and enable the bundled extension, reporting what it did.
Installs it if missing, keeps the installed copy current, and enables it. Skipped on non-GNOME desktops; non-fatal on missing sources or write failures.
Source code in src/core/gnome_extension.py
356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 | |