Core API¶
The core package holds the voice-control engine: wake-word detection,
transcription, command routing, speech output, the panel indicator, and the
GNOME Shell extension lifecycle. Plugins receive an EasySpeak
instance and use its small public API (speak, host_run, transcribe,
wait_for_speech, record_until_silence, deactivate, ...) to act on commands.
core.main¶
core.main
¶
EasySpeak Core - Voice Control for Linux.
Loads plugins from plugins/ folder automatically. Uses OpenWakeWord for fast wake detection.
EasySpeak
¶
The voice-control daemon: wake detection, transcription, and routing.
Owns the audio pipeline (wake word -> Whisper), the loaded plugins, the
text-to-speech pipeline, and the panel indicator, and exposes the small
plugin-facing API (speak,
host_run,
transcribe, ...) that plugins use to act on
commands.
Initialise daemon state; models and audio are loaded later in run().
Source code in src/core/main.py
host_run
¶
Run a shell command.
speak
¶
tap_key
¶
Replay a multimedia key so the desktop renders its native feedback.
Returns True if the key was injected, False if unavailable (e.g. a non-GNOME session) so the caller can fall back to a silent change. jeepney is imported lazily so the dependency isn't needed to load.
Source code in src/core/main.py
deactivate
¶
Request the assistant go to sleep (plugin-facing).
Releases the mic and stops wake detection until reactivated from the tray. The actual release happens at the next main-loop iteration (handled by the tray controller) so the triggering command can finish, and speak, first.
Source code in src/core/main.py
register_push_to_talk
¶
Register the dictation session the hotkey runs while its combo is held.
Plugin-facing: the dictation plugin registers here in its setup() so core
can drive keyboard (silent) activation without importing a plugin
directly. handler takes one should_continue predicate and runs
until it returns False (the keys are released).
Source code in src/core/main.py
load_plugins
¶
Discover and import every plugin module from the plugins/ dir.
Files are loaded in sorted order (numeric prefixes set load order); names
starting with _ are skipped. A module is registered only if it exposes NAME
and handle; its optional setup hook runs once. Import or setup failures are
logged and skipped, never fatal.
Source code in src/core/main.py
get_all_commands
¶
Get all commands from all plugins for help text.
route_command
¶
Route command to appropriate plugin.
Returns False to exit.
Source code in src/core/main.py
flush_stream
¶
Flush any remaining audio data from the stream buffer.
Source code in src/core/main.py
is_silence
¶
record_until_silence
¶
Record mic audio until a short silence, capped at five seconds.
Returns the captured PCM bytes. Plugin-facing. should_continue (used by
push-to-talk) lets a key release cut the recording short instead of waiting out
the silence window.
Source code in src/core/main.py
wait_for_speech
¶
Block until speech is heard, returning its first PCM chunk.
Returns None if nothing is heard within timeout seconds. Plugin-facing.
should_continue (used by push-to-talk) returns None early once it goes False,
so a key release ends the wait.
Source code in src/core/main.py
transcribe
¶
Transcribe raw PCM audio to text with Whisper.
prompt biases recognition (defaults to the command vocabulary). Plugin-facing.
Source code in src/core/main.py
run
¶
Load models and plugins, then run the wake-word listen loop forever.
Blocks until the user quits (voice command, tray, or Ctrl-C), always releasing the microphone and draining speech on the way out.
Source code in src/core/main.py
470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 | |
core.cli¶
core.cli
¶
Command-line entry point: parse arguments, set up logging, run the app.
parse_args
¶
Parse EasySpeak's command-line arguments.
Source code in src/core/cli.py
core.config¶
core.config
¶
Tuning constants and model factory for EasySpeak.
Override the EASYSPEAK_* environment variables to customise behaviour without editing source. Plugin-specific host-environment setup lives in each plugin's own setup() hook, not here.
load_whisper_model
¶
load_whisper_model(model_name=WHISPER_MODEL, compute_type=WHISPER_COMPUTE_TYPE, cpu_threads=WHISPER_CPU_THREADS)
Build a faster-whisper model from the configured (or given) settings.
Source code in src/core/config.py
core.log¶
core.log
¶
Logging setup for EasySpeak's terminal output.
Output is message-only (no level or timestamp prefixes) so the CLI reads nicely for
humans; the level only gates which lines appear. Only the easyspeak and plugins
logger hierarchies are configured, so importing the package as a library leaves the root
logger untouched.
resolve_level
¶
Pick the log level: CLI flags win, then EASYSPEAK_LOG_LEVEL, else INFO.
Source code in src/core/log.py
configure
¶
Attach a message-only stderr handler to the package loggers.
Source code in src/core/log.py
core.speech¶
core.speech
¶
Text-to-speech playback pipeline and audio-device noise suppression.
Extracted from core.main so the orchestration loop isn't tangled up with the subprocess
plumbing that keeps a piper voice model warm. SpeechPipeline owns a persistent piper
-> player pair; suppressed_c_stderr hides the unrelated ALSA/JACK probe spew
PortAudio emits when the input side (PyAudio) starts up.
SpeechPipeline
¶
A persistent piper -> audio player pipeline for low-latency speech.
Keeping piper alive avoids reloading its voice model (~2s) on every phrase, and streaming raw PCM straight to the player means playback starts as soon as synthesis does instead of after a temp WAV is fully written.
Create an idle pipeline; the subprocesses are spawned on first use.
Source code in src/core/speech.py
ensure
¶
Spawn the persistent piper -> player pipeline if not already running.
Raises OSError if the pipeline can't be brought up (missing binary, bad model), so callers can warn once rather than silently failing per phrase.
Source code in src/core/speech.py
speak
¶
Text-to-speech output.
Non-blocking: the phrase is handed to a warm piper process that streams audio to the player, so this returns immediately and the speech plays concurrently with whatever the caller does next.
Source code in src/core/speech.py
drain
¶
Flush pending speech, wait for playback, then stop the pipeline.
Used on shutdown so a final phrase (e.g. "Goodbye.") is heard in full before the process exits.
Source code in src/core/speech.py
suppressed_c_stderr
¶
Silence the device-probe spew PortAudio dumps when PyAudio starts up.
Initializing PyAudio makes PortAudio enumerate every ALSA PCM named in the system alsa.conf (surround*, hdmi, iec958, ...) and ping JACK. Devices the machine doesn't have each print a harmless error, and libasound/libjack write them straight to file descriptor 2 from C — Python's sys.stderr never sees them, so only redirecting the fd itself can hide them.
Suppression is best-effort: if stderr has no real fd (e.g. captured in tests), the fd can't be snapshotted (e.g. fd exhaustion), or the fd swap itself fails (e.g. fd 2 closed/invalid), there's nothing safe to redirect, so pass through rather than abort the caller.
Source code in src/core/speech.py
core.tray¶
core.tray
¶
GNOME panel-indicator bridge and asleep-state lifecycle for the daemon.
The daemon is voice-first and otherwise headless; this module gives it a
top-panel microphone icon (served by the bundled easyspeak@local GNOME
Shell extension) and owns everything about it so the audio loop in core.main
stays about audio. It runs without a D-Bus server of its own via two
one-directional channels:
- daemon -> icon: state is pushed for display via a one-shot
gdbus call(the same path the mouse-grid plugin uses to drive the extension). - icon -> daemon: the extension's menu writes a single command to a control file; the controller consumes it. The audio loop already wakes every ~80ms on a read, so a cheap file probe per iteration is enough — no GLib loop needed.
The indicator is shown only while the assistant is asleep (deactivated); while running it stays hidden because GNOME's own microphone privacy icon already signals the open mic.
TrayAction
¶
Bases: Enum
flowchart TD
core.tray.TrayAction[TrayAction]
click core.tray.TrayAction href "" "core.tray.TrayAction"
What Tray.poll is telling the audio loop to do next.
Tray
¶
Owns the EasySpeak panel indicator and the asleep (deactivated) lifecycle.
The audio loop calls poll once per iteration; the
controller pushes display state, consumes menu commands, and — when deactivated —
runs the idle loop itself, using caller-supplied callbacks to release and reacquire
the microphone. It never touches audio directly.
Best-effort throughout: on a non-GNOME desktop (no gdbus/extension) the state
pushes simply fail and the daemon carries on, mirroring how the mouse-grid plugin
tolerates a missing extension.
Set up the controller, optionally with a spoken-feedback callback.
speak defaults to a no-op so the tray works headless and in tests.
For the spoken feedback callback (core.speak) the plugin only announces the attempt ("Going to sleep."); the tray confirms or, when it can't actually sleep, explains — since only it knows whether sleep engaged. Defaults to a no-op so the tray works headless and in tests.
Source code in src/core/tray.py
started
¶
Daemon is up and listening; ensure the indicator is hidden.
Also drop any command left in the control file from before startup. It's a one-shot channel for live menu clicks, so a stale 'about'/'help'/ 'quit' written during a previous session (or before the daemon was running to consume it) must not fire the moment we begin polling — which otherwise pops the About window open on launch.
Source code in src/core/tray.py
stopped
¶
request_sleep
¶
Queue a deactivate (e.g. the "go to sleep" voice command).
The mic is released at the audio loop's next poll,
after the current command finishes.
poll
¶
Act on any pending menu command or queued sleep request.
release_mic / acquire_mic are zero-arg callbacks that close and reopen the
input stream; the controller calls them around the asleep idle loop so this
module stays out of the audio internals. Returns a
TrayAction for the audio loop to dispatch on.
Source code in src/core/tray.py
set_state
¶
Push the current display state to the panel indicator (best-effort).
Source code in src/core/tray.py
take_command
¶
Return and clear the latest menu command, or None if none is pending.
Reads the one-shot control file the extension writes, then deletes it so each menu click fires exactly once. A missing file is the common case (every idle iteration), so it's checked cheaply first. The delete is best-effort and kept separate from the read: a command that was read successfully is returned even if the unlink fails (e.g. the file vanished in a race), rather than being silently dropped.
Source code in src/core/tray.py
core.hotkey¶
core.hotkey
¶
Hold-to-dictate keyboard activation via evdev (the silent-activation path).
On Wayland an ordinary process can't grab global keys, and only raw evdev events from
/dev/input expose key release — which "hold the keys to dictate, release to stop"
needs. This listener reads every keyboard device in a background thread and tracks
whether a configured modifier combo (default Ctrl+Shift) is fully held, so the audio
loop can start dictation on press and end it the moment the keys come up.
It is best-effort: if python-evdev isn't installed or /dev/input isn't readable (the
user isn't in the input group), the listener logs how to enable it and stays inert, so
the daemon runs normally without the hotkey.
HotkeyListener
¶
Track whether a key combo is held, off a background evdev reader.
The audio loop calls take_activation
once per iteration to learn when the combo was just pressed, then drives a dictation
session gated on is_held so releasing the
keys ends it. All evdev/threading state is kept behind a lock; the event-processing
logic in _process is pure and the I/O lives in _grab_devices/_drain, so the
behaviour is unit-testable without a real keyboard.
Set up the listener for combo (e.g. "ctrl+shift").
An unparseable or empty combo, or enabled=False, leaves the listener disabled
so start is a no-op.
Source code in src/core/hotkey.py
is_held
¶
take_activation
¶
Return True once per press of the combo, clearing the edge.
The audio loop polls this each iteration; a True means "the user just pressed the combo — start dictation now".
Source code in src/core/hotkey.py
start
¶
Begin watching the keyboard, unless disabled or no device is readable.
Logs and stays inert (no thread) when the feature is turned off, evdev is
missing, or /dev/input can't be read, so the daemon runs normally either way.
Source code in src/core/hotkey.py
stop
¶
Stop the reader thread, then release the keyboard devices.
Joins the thread first so it leaves its select() loop before the fds close
under it — closing a fd mid-select can raise in the thread. The loop wakes
within one POLL_TIMEOUT, so the join returns promptly.
Source code in src/core/hotkey.py
parse_combo
¶
Parse a '+'-separated combo string into groups of accepted key names.
Each element becomes a group of evdev key names of which any one satisfies
that part of the combo, so "ctrl+shift" matches either Ctrl together
with either Shift. Names that aren't known aliases fall through as a single
literal evdev key name ("rightctrl" -> "KEY_RIGHTCTRL") so less
common combos still work.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
spec
|
str
|
A combo such as |
required |
Returns:
| Type | Description |
|---|---|
frozenset[str]
|
A tuple of frozensets, one per combo element; the combo is held when |
...
|
every group has at least one of its keys down. |
Source code in src/core/hotkey.py
core.mediakeys¶
core.mediakeys
¶
Replay multimedia keys through GNOME (Mutter) for native desktop feedback.
This lets the desktop render its own volume OSD and chime rather than imitating them. Pressing a volume key does not run any command: gnome-settings-daemon grabs the raw evdev key and handles the volume change, OSD and chime itself. The only way to reproduce that exactly is to replay the key. We inject it through Mutter's RemoteDesktop interface, which needs no special privileges. A RemoteDesktop session lives only as long as the D-Bus connection that created it, so the whole CreateSession -> Start -> NotifyKeyboardKeycode -> Stop sequence runs on one connection.
tap_key
¶
Press and release one evdev keycode via Mutter RemoteDesktop.
Raises if RemoteDesktop is unavailable (e.g. a non-GNOME session) so the caller can fall back to a silent change.
Source code in src/core/mediakeys.py
core.gnome_extension¶
core.gnome_extension
¶
Install, refresh, and enable the bundled GNOME Shell extension.
Core owns the extension's lifecycle because both the mousegrid plugin and core.tray
drive it over D-Bus; ensure_extension runs
once at startup. The extension ships as package data — extension.js, the
extension-helpers.js it imports, and metadata.json — copied into the user's
extensions dir as a set. On Wayland GNOME only loads extension code at login, so the
startup copy is always one login behind; a oneshot systemd user unit
ordered before the shell re-copies it at the start of every login to close that gap. The
unit runs as the user, writes only to $HOME, and needs no privileges.
RefreshResult
¶
Bases: Enum
flowchart TD
core.gnome_extension.RefreshResult[RefreshResult]
click core.gnome_extension.RefreshResult href "" "core.gnome_extension.RefreshResult"
Outcome of a single refresh attempt.
extension_source_dir
¶
Return the package dir holding the bundled assets.
Resolved relative to this module (src/) so it works in both editable and wheel
installs.
extensions_root
¶
extension_dest_dir
¶
refresh_extension_files
¶
Copy the bundled assets into dest_dir when they differ from it.
Best-effort: returns REFRESHED if written, UNCHANGED if already
current, or ERROR (noted on stderr) when a source is missing or a copy
fails. Assets are staged then moved into place (extension.js last), so a
failure leaves any working install untouched and never installs extension.js
without the helper it imports.
Source code in src/core/gnome_extension.py
refresh_installed_extension
¶
Refresh the installed extension from the bundled copy.
Run by the systemd user unit at each login.
install_refresh_unit
¶
Install and enable the pre-shell refresh unit, idempotently.
Returns a short status string, or None when nothing changed or it isn't applicable (no systemd, write/enable failure).
Source code in src/core/gnome_extension.py
migrate_legacy_extension
¶
Remove the pre-rename extension install so it can't double-load.
The extension's UUID changed from easyspeak-grid@local to
easyspeak@local (it long outgrew being just a mouse grid). A leftover
copy under the old UUID would stay enabled and add a second panel indicator
and grid that the daemon no longer drives, so clean it up: disable it (so
GNOME drops it from the enabled set) then delete its directory. Best-effort
throughout — returns True if a legacy install was found and removed.
One-off migration shim: once users have had a release or two to upgrade past
easyspeak-grid@local this can be removed (call site and tests included).
Source code in src/core/gnome_extension.py
ensure_extension
¶
Install, refresh, and enable the bundled extension, reporting what it did.
Installs it if missing, keeps the installed copy current, and enables it. Skipped on non-GNOME desktops; non-fatal on missing sources or write failures.
Source code in src/core/gnome_extension.py
254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 | |