diff --git a/.env.example b/.env.example index 4bef9cc..88594d0 100644 --- a/.env.example +++ b/.env.example @@ -2,6 +2,7 @@ PERPLEXITY_API_KEY=your_perplexity_api_key_here PERPLEXITY_MODEL=llama-3.1-sonar-small-128k-chat DEEPGRAM_API_KEY=your_deepgram_api_key_here PORCUPINE_ACCESS_KEY=your_porcupine_access_key_here +PORCUPINE_SENSITIVITY=0.8 TTS_EN_SPEAKER=en_0 WEATHER_LAT=63.56 WEATHER_LON=53.69 diff --git a/Makefile b/Makefile new file mode 100644 index 0000000..b81a8fe --- /dev/null +++ b/Makefile @@ -0,0 +1,10 @@ +.PHONY: run check qwen-context + +run: + python run.py + +check: + ./scripts/qwen-check.sh + +qwen-context: + ./scripts/qwen-context.sh diff --git a/QWEN.md b/QWEN.md new file mode 100644 index 0000000..cd1f0f8 --- /dev/null +++ b/QWEN.md @@ -0,0 +1,32 @@ +# Qwen Context: alexander_smart-speaker + +## Goal +Voice assistant for Linux with wake word, STT/TTS, AI dialogue, weather, timer/alarm/stopwatch and volume control. + +## Architecture +- Entry: `run.py` -> `app/main.py` +- Audio layer: `app/audio/` (`wakeword.py`, `stt.py`, `tts.py`, `sound_level.py`) +- Core logic: `app/core/` (`commands.py`, `ai.py`, `config.py`, `cleaner.py`) +- Features: `app/features/` (weather, timer, stopwatch, alarm, music, cities game) +- State: `data/*.json` + +## High-Value Files +- `app/core/commands.py` for intent routing +- `app/main.py` for event loop and orchestration +- `app/core/config.py` for env configuration + +## How To Work In This Repo +1. Keep edits minimal and local. +2. Prefer fixes with clear fallback behavior (microphone/API failures). +3. Do not hardcode secrets; use `.env` and `.env.example`. +4. Update README when behavior/commands change. + +## Quick Checks +```bash +./scripts/qwen-check.sh +``` + +## Notes For Agent +- If touching audio code, keep Linux compatibility first. +- For command parsing, add/adjust tests when test infra exists. +- Preserve Russian command phrases compatibility. diff --git a/README.md b/README.md index 3bc8899..4ebee0a 100644 --- a/README.md +++ b/README.md @@ -1,146 +1,122 @@ -# πŸŽ™οΈ Alexander Smart Speaker +# Alexander Smart Speaker -
+Голосовой ассистСнт для Linux с wake word, STT/TTS ΠΈ Π½Π°Π±ΠΎΡ€ΠΎΠΌ голосовых Π½Π°Π²Ρ‹ΠΊΠΎΠ². -![Python](https://img.shields.io/badge/Python-3.9%2B-3776AB?logo=python&logoColor=white&style=for-the-badge) -![Platform](https://img.shields.io/badge/Platform-Linux-FCC624?logo=linux&logoColor=black&style=for-the-badge) -![License](https://img.shields.io/badge/License-MIT-45a163?style=for-the-badge) +## Π§Ρ‚ΠΎ ΡƒΠΌΠ΅Π΅Ρ‚ +- Активация ΠΏΠΎ ΠΊΠ»ΡŽΡ‡Π΅Π²ΠΎΠΌΡƒ слову `Alexandr`. +- Π”ΠΈΠ°Π»ΠΎΠ³ с AI (Perplexity) с сохранСниСм контСкста. +- ΠŸΠ΅Ρ€Π΅Π²ΠΎΠ΄ RU ↔ EN с ΠΎΠ·Π²ΡƒΡ‡ΠΈΠ²Π°Π½ΠΈΠ΅ΠΌ. +- Погода ΠΏΠΎ ΡƒΠΌΠΎΠ»Ρ‡Π°Π½ΠΈΡŽ ΠΈ ΠΏΠΎ названию Π³ΠΎΡ€ΠΎΠ΄Π°. +- Π‘ΡƒΠ΄ΠΈΠ»ΡŒΠ½ΠΈΠΊΠΈ, Ρ‚Π°ΠΉΠΌΠ΅Ρ€Ρ‹ ΠΈ сСкундомСры. +- Π£ΠΏΡ€Π°Π²Π»Π΅Π½ΠΈΠ΅ Π³Ρ€ΠΎΠΌΠΊΠΎΡΡ‚ΡŒΡŽ систСмы. +- Π£ΠΏΡ€Π°Π²Π»Π΅Π½ΠΈΠ΅ Spotify (play/pause/next/current track). +- Π˜Π³Ρ€Π° Π² Π³ΠΎΡ€ΠΎΠ΄Π°. -**Alexander** is a personal voice assistant for Linux that leverages modern AI technologies to create natural conversations. It listens, understands context, translates languages, checks the weather, and manages your time. +## Π’Π΅Ρ…Π½ΠΎΠ»ΠΎΠ³ΠΈΠΈ +- Wake word: `pvporcupine` +- STT: `deepgram-sdk` +- TTS: `Silero` (`torch`, `torchaudio`) +- AI: Perplexity API +- Погода: Open-Meteo +- ΠœΡƒΠ·Ρ‹ΠΊΠ°: Spotify Web API (`spotipy`) -[Features](#-features) β€’ [Installation](#-installation) β€’ [Usage](#-usage) β€’ [Architecture](#-architecture) +## ВрСбования +- Linux +- Python 3.9+ +- БистСмныС ΠΏΠ°ΠΊΠ΅Ρ‚Ρ‹: + +```bash +sudo apt-get update +sudo apt-get install -y portaudio19-dev libasound2-dev mpg123 +``` + +Для управлСния Π³Ρ€ΠΎΠΌΠΊΠΎΡΡ‚ΡŒΡŽ Π½ΡƒΠΆΠ΅Π½ `pactl` ΠΈΠ»ΠΈ `amixer` (ΠΎΠ±Ρ‹Ρ‡Π½ΠΎ ΠΈΠ· `pulseaudio-utils`/`alsa-utils`). + +## Установка -
- ---- - -## ✨ Features - -### 🧠 Artificial Intelligence -* **Smart Dialogue**: Context-aware conversations powered by **Perplexity AI** (Llama 3.1). -* **Translator**: Instant bidirectional translation (RU ↔ EN) with native pronunciation. - -### πŸ—£οΈ Voice Interface -* **Wake Word**: Activates on the phrase **"Alexander"** (powered by Porcupine). -* **Speech Recognition**: Fast and accurate Speech-to-Text via **Deepgram**. -* **Text-to-Speech**: Natural sounding offline voice synthesis using **Silero TTS**. - -### πŸ› οΈ Tools -* **β›… Weather**: Detailed forecasts (current, daily range, hourly) via Open-Meteo. -* **⏰ Alarm & Timer**: Voice-controlled alarms and timers. -* **πŸ”Š System Control**: Adjust system volume via voice commands. - ---- - -## βš™οΈ Installation - -### 1. Prerequisites -* **OS**: Linux -* **Python**: 3.9+ -* **System Libraries**: - ```bash - sudo apt-get install portaudio19-dev libasound2-dev mpg123 - ``` - -### 2. Setup ```bash -# Clone the repository git clone https://github.com/your-username/alexander_smart-speaker.git cd alexander_smart-speaker -# Create virtual environment python -m venv venv source venv/bin/activate - -# Install dependencies pip install -r requirements.txt ``` -### 3. Configuration -Create a `.env` file based on the example: +## Настройка `.env` + ```bash cp .env.example .env ``` -Fill in your API keys in `.env`: +Минимально ΠΎΠ±ΡΠ·Π°Ρ‚Π΅Π»ΡŒΠ½Ρ‹Π΅ ΠΏΠ΅Ρ€Π΅ΠΌΠ΅Π½Π½Ρ‹Π΅: + ```ini -# AI & Speech APIs -PERPLEXITY_API_KEY=pplx-... +PERPLEXITY_API_KEY=... DEEPGRAM_API_KEY=... PORCUPINE_ACCESS_KEY=... - -# TTS Settings -TTS_EN_SPEAKER=en_0 -TTS_RU_SPEAKER=eugene - -# Weather Location (Your City Coordinates) -WEATHER_LAT=63.56 -WEATHER_LON=53.69 -WEATHER_CITY=Ukhta ``` ---- +ΠŸΠΎΠ»Π½Ρ‹ΠΉ ΠΏΡ€ΠΈΠΌΠ΅Ρ€ (ΠΊΠ°ΠΊ Π² `.env.example`): -## πŸš€ Usage +```ini +PERPLEXITY_API_KEY=your_perplexity_api_key_here +PERPLEXITY_MODEL=llama-3.1-sonar-small-128k-chat +DEEPGRAM_API_KEY=your_deepgram_api_key_here +PORCUPINE_ACCESS_KEY=your_porcupine_access_key_here +PORCUPINE_SENSITIVITY=0.8 +TTS_EN_SPEAKER=en_0 +WEATHER_LAT=63.56 +WEATHER_LON=53.69 +WEATHER_CITY=Π£Ρ…Ρ‚Π° +SPOTIFY_CLIENT_ID=your_spotify_client_id +SPOTIFY_CLIENT_SECRET=your_spotify_client_secret +SPOTIFY_REDIRECT_URI=http://localhost:8888/callback +``` + +## Запуск -Start the assistant: ```bash +make run +# ΠΈΠ»ΠΈ python run.py ``` -### Command Examples +## ΠŸΡ€ΠΈΠΌΠ΅Ρ€Ρ‹ голосовых ΠΊΠΎΠΌΠ°Π½Π΄ +- Активация: `Alexandr` +- Π”ΠΈΠ°Π»ΠΎΠ³: `ΠŸΠΎΡ‡Π΅ΠΌΡƒ Π½Π΅Π±ΠΎ Π³ΠΎΠ»ΡƒΠ±ΠΎΠ΅?` +- Погода: `Какая сСйчас ΠΏΠΎΠ³ΠΎΠ΄Π°?`, `Погода Π² МосквС` +- ΠŸΠ΅Ρ€Π΅Π²ΠΎΠ΄: `ΠŸΠ΅Ρ€Π΅Π²Π΅Π΄ΠΈ Π½Π° английский: ΠΊΠ°ΠΊ Π΄Π΅Π»Π°` +- Π’Π°ΠΉΠΌΠ΅Ρ€: `ΠŸΠΎΡΡ‚Π°Π²ΡŒ Ρ‚Π°ΠΉΠΌΠ΅Ρ€ Π½Π° 5 ΠΌΠΈΠ½ΡƒΡ‚` +- Π‘ΡƒΠ΄ΠΈΠ»ΡŒΠ½ΠΈΠΊ: `ΠŸΠΎΡΡ‚Π°Π²ΡŒ Π±ΡƒΠ΄ΠΈΠ»ΡŒΠ½ΠΈΠΊ Π½Π° 7:30`, `Π‘ΡƒΠ΄ΠΈΠ»ΡŒΠ½ΠΈΠΊ ΠΏΠΎ будням Π² 8:00` +- Π‘Π΅ΠΊΡƒΠ½Π΄ΠΎΠΌΠ΅Ρ€: `Запусти сСкундомСр`, `ПокаТи Π°ΠΊΡ‚ΠΈΠ²Π½Ρ‹Π΅ сСкундомСры` +- Π“Ρ€ΠΎΠΌΠΊΠΎΡΡ‚ΡŒ: `Π“Ρ€ΠΎΠΌΠΊΠΎΡΡ‚ΡŒ 5` +- Spotify: `Π’ΠΊΠ»ΡŽΡ‡ΠΈ ΠΌΡƒΠ·Ρ‹ΠΊΡƒ`, `ΠŸΠ°ΡƒΠ·Π°`, `Π§Ρ‚ΠΎ сСйчас ΠΈΠ³Ρ€Π°Π΅Ρ‚` +- Π˜Π³Ρ€Π°: `Π”Π°Π²Π°ΠΉ сыграСм Π² Π³ΠΎΡ€ΠΎΠ΄Π°` +- ΠžΡΡ‚Π°Π½ΠΎΠ²ΠΊΠ°/ΠΏΡ€Π΅Ρ€Ρ‹Π²Π°Π½ΠΈΠ΅: `Π‘Ρ‚ΠΎΠΏ`, `Π₯Π²Π°Ρ‚ΠΈΡ‚`, `ΠŸΠΎΠ²Ρ‚ΠΎΡ€ΠΈ` -| Category | User Command (RU) | Action | -|----------|-------------------|--------| -| **Activation** | "Alexander" | Assistant starts listening | -| **Dialogue** | "ΠŸΠΎΡ‡Π΅ΠΌΡƒ Π½Π΅Π±ΠΎ Π³ΠΎΠ»ΡƒΠ±ΠΎΠ΅?" | Ask AI with context retention | -| **Weather** | "Какая сСйчас ΠΏΠΎΠ³ΠΎΠ΄Π°?", "НуТСн Π»ΠΈ Π·ΠΎΠ½Ρ‚?" | Get weather forecast | -| **Translation** | "ΠŸΠ΅Ρ€Π΅Π²Π΅Π΄ΠΈ Π½Π° английский: ΠΏΡ€ΠΈΠ²Π΅Ρ‚, ΠΊΠ°ΠΊ Π΄Π΅Π»Π°?" | Translate and speak in EN | -| **Alarm** | "Π Π°Π·Π±ΡƒΠ΄ΠΈ мСня Π² 7:30", "ΠŸΠΎΡΡ‚Π°Π²ΡŒ Ρ‚Π°ΠΉΠΌΠ΅Ρ€ Π½Π° 5 ΠΌΠΈΠ½ΡƒΡ‚" | Set alarm or timer | -| **Volume** | "Π“Ρ€ΠΎΠΌΠΊΠΎΡΡ‚ΡŒ 5", "Π“Ρ€ΠΎΠΌΠΊΠΎΡΡ‚ΡŒ 8" | Set system volume level | -| **Control** | "Π‘Ρ‚ΠΎΠΏ", "Π₯Π²Π°Ρ‚ΠΈΡ‚", "ΠŸΠΎΠ²Ρ‚ΠΎΡ€ΠΈ" | Stop speech or repeat last phrase | +## ΠŸΠΎΠ»Π΅Π·Π½Ρ‹Π΅ ΠΊΠΎΠΌΠ°Π½Π΄Ρ‹ ---- - -## πŸ—οΈ Architecture - -```mermaid -graph TD - Mic[🎀 Microphone] --> Wake[Wake Word
Porcupine] - Wake -->|Activated| STT[STT
Deepgram] - - STT --> Router{Command Router} - - Router -->|Forecast| Weather[β›… Weather
Open-Meteo] - Router -->|Time| Alarm[⏰ Alarm/Timer] - Router -->|Settings| Vol[πŸ”Š Volume] - Router -->|Translate| Translator[A↔B Translator] - Router -->|Query| AI[🧠 Perplexity AI] - - Weather --> TTS - Alarm --> TTS - Vol --> TTS - Translator --> TTS - AI --> Cleaner[Text Cleaner] - Cleaner --> TTS[πŸ—£οΈ TTS
Silero] - - TTS --> Speaker[πŸ”Š Speaker] +```bash +make run # запуск ассистСнта +make check # базовая ΠΏΡ€ΠΎΠ²Π΅Ρ€ΠΊΠ° ΠΏΡ€ΠΎΠ΅ΠΊΡ‚Π° +make qwen-context # ΡΠΎΠ±Ρ€Π°Ρ‚ΡŒ контСкст ΠΏΡ€ΠΎΠ΅ΠΊΡ‚Π° ``` -## πŸ“‚ Project Structure -* `app/main.py` β€” Entry point, main event loop. -* `app/audio/` β€” Audio processing modules (STT, TTS, Wake Word). -* `app/core/` β€” AI logic, configuration, text cleaning. -* `app/features/` β€” Skills (Weather, Alarm, Timer). -* `assets/` β€” Models (Porcupine) and sound effects. -* `data/` β€” Persistent state (alarms). +## Π‘Ρ‚Ρ€ΡƒΠΊΡ‚ΡƒΡ€Π° ΠΏΡ€ΠΎΠ΅ΠΊΡ‚Π° +- `run.py` - Ρ‚ΠΎΡ‡ΠΊΠ° Π²Ρ…ΠΎΠ΄Π°. +- `app/main.py` - Π³Π»Π°Π²Π½Ρ‹ΠΉ Ρ†ΠΈΠΊΠ» ассистСнта. +- `app/audio/` - wake word, STT, TTS, Π³Ρ€ΠΎΠΌΠΊΠΎΡΡ‚ΡŒ. +- `app/core/` - ΠΊΠΎΠ½Ρ„ΠΈΠ³, AI, Ρ€ΠΎΡƒΡ‚ΠΈΠ½Π³ ΠΊΠΎΠΌΠ°Π½Π΄, ΡƒΡ‚ΠΈΠ»ΠΈΡ‚Ρ‹. +- `app/features/` - ΠΏΠΎΠ³ΠΎΠ΄Π°, Π±ΡƒΠ΄ΠΈΠ»ΡŒΠ½ΠΈΠΊ, Ρ‚Π°ΠΉΠΌΠ΅Ρ€, сСкундомСр, ΠΌΡƒΠ·Ρ‹ΠΊΠ°, Π³ΠΎΡ€ΠΎΠ΄Π°. +- `assets/` - ΠΌΠΎΠ΄Π΅Π»ΠΈ ΠΈ Π·Π²ΡƒΠΊΠΈ. +- `data/` - сохранСнныС Π±ΡƒΠ΄ΠΈΠ»ΡŒΠ½ΠΈΠΊΠΈ/Ρ‚Π°ΠΉΠΌΠ΅Ρ€Ρ‹/сСкундомСры. ---- +## Диагностика ΠΏΡ€ΠΎΠ±Π»Π΅ΠΌ +- Ошибки STT/AI: ΠΏΡ€ΠΎΠ²Π΅Ρ€ΡŒΡ‚Π΅ ΠΊΠ»ΡŽΡ‡ΠΈ Π² `.env`. +- НСт Π·Π²ΡƒΠΊΠ°: ΠΏΡ€ΠΎΠ²Π΅Ρ€ΡŒΡ‚Π΅ систСмноС устройство Π²Ρ‹Π²ΠΎΠ΄Π° ΠΈ ΡƒΡ‚ΠΈΠ»ΠΈΡ‚Ρ‹ `pactl`/`amixer`. +- НС ΠΈΠ³Ρ€Π°Π΅Ρ‚ Π±ΡƒΠ΄ΠΈΠ»ΡŒΠ½ΠΈΠΊ/Ρ‚Π°ΠΉΠΌΠ΅Ρ€: ΡƒΠ±Π΅Π΄ΠΈΡ‚Π΅ΡΡŒ, Ρ‡Ρ‚ΠΎ установлСн `mpg123`. +- Spotify Π½Π΅ управляСтся: ΠΏΡ€ΠΎΠ²Π΅Ρ€ΡŒΡ‚Π΅ `SPOTIFY_*`, Π°Π²Ρ‚ΠΎΡ€ΠΈΠ·Π°Ρ†ΠΈΡŽ ΠΈ Π½Π°Π»ΠΈΡ‡ΠΈΠ΅ Π°ΠΊΡ‚ΠΈΠ²Π½ΠΎΠ³ΠΎ устройства. -## πŸ› οΈ Troubleshooting -* **Deepgram Error 400**: Check your API key balance and validity in `.env`. -* **No Sound**: Ensure `amixer` is installed and the default audio output is correctly configured in your OS. -* **Alarm not playing**: Verify that `mpg123` is installed (`sudo apt install mpg123`). - -## πŸ“„ License -MIT License. See `LICENSE.txt` for details. +## ЛицСнзия +MIT, см. `LICENSE.txt`. diff --git a/app/audio/sound_level.py b/app/audio/sound_level.py index 9939d49..096d385 100644 --- a/app/audio/sound_level.py +++ b/app/audio/sound_level.py @@ -9,6 +9,7 @@ Regulates system volume on a scale from 1 to 10. import subprocess import re import platform +from ..core.roman import replace_roman_numerals # ΠšΠ°Ρ€Ρ‚Π° для ΠΏΠ΅Ρ€Π΅Π²ΠΎΠ΄Π° слов Π² Ρ†ΠΈΡ„Ρ€Ρ‹ ("ΠΏΡΡ‚ΡŒ" -> 5) NUMBER_MAP = { @@ -148,7 +149,7 @@ def parse_volume_text(text: str) -> int | None: ΠŸΡ‹Ρ‚Π°Π΅Ρ‚ΡΡ Π½Π°ΠΉΡ‚ΠΈ число громкости Π² тСкстС. ΠŸΠΎΠ½ΠΈΠΌΠ°Π΅Ρ‚ ΠΈ Ρ†ΠΈΡ„Ρ€Ρ‹ ("5"), ΠΈ слова ("ΠΏΡΡ‚ΡŒ"). """ - text = text.lower() + text = replace_roman_numerals(text.lower()) # 1. Π˜Ρ‰Π΅ΠΌ Ρ†ΠΈΡ„Ρ€Ρ‹ (1-10) num_match = re.search(r"\b(10|[1-9])\b", text) diff --git a/app/audio/stt.py b/app/audio/stt.py index 7523f42..f245282 100644 --- a/app/audio/stt.py +++ b/app/audio/stt.py @@ -8,6 +8,7 @@ Supports Russian (default) and English. # Π˜ΡΠΏΠΎΠ»ΡŒΠ·ΡƒΠ΅Ρ‚ Deepgram API Ρ‡Π΅Ρ€Π΅Π· Π²Π΅Π±-сокСты для ΠΏΠΎΡ‚ΠΎΠΊΠΎΠ²ΠΎΠ³ΠΎ распознавания Π² Ρ€Π΅Π°Π»ΡŒΠ½ΠΎΠΌ Π²Ρ€Π΅ΠΌΠ΅Π½ΠΈ. import asyncio +import re import time import pyaudio import logging @@ -24,16 +25,19 @@ import websockets.sync.client from ..core.audio_manager import get_audio_manager # --- ΠŸΠ°Ρ‚Ρ‡ (исправлСниС) для Π±ΠΈΠ±Π»ΠΈΠΎΡ‚Π΅ΠΊΠΈ websockets --- -# По ΡƒΠΌΠΎΠ»Ρ‡Π°Π½ΠΈΡŽ Deepgram SDK ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΡƒΠ΅Ρ‚ слишком ΠΊΠΎΡ€ΠΎΡ‚ΠΊΠΈΠΉ Ρ‚Π°ΠΉΠΌΠ°ΡƒΡ‚ ΠΏΠΎΠ΄ΠΊΠ»ΡŽΡ‡Π΅Π½ΠΈΡ. -# Π­Ρ‚ΠΎ часто Π²Ρ‹Π·Ρ‹Π²Π°Π΅Ρ‚ ошибки ΠΏΡ€ΠΈ ΠΌΠ΅Π΄Π»Π΅Π½Π½ΠΎΠΌ SSL Ρ€ΡƒΠΊΠΎΠΏΠΎΠΆΠ°Ρ‚ΠΈΠΈ. -# ΠœΡ‹ подмСняСм Ρ„ΡƒΠ½ΠΊΡ†ΠΈΡŽ connect, Ρ‡Ρ‚ΠΎΠ±Ρ‹ ΡƒΠ²Π΅Π»ΠΈΡ‡ΠΈΡ‚ΡŒ Ρ‚Π°ΠΉΠΌΠ°ΡƒΡ‚ Π΄ΠΎ 30 сСкунд. +# Π―Π²Π½ΠΎ Π·Π°Π΄Π°Ρ‘ΠΌ Ρ‚Π°ΠΉΠΌΠ°ΡƒΡ‚Ρ‹ ΠΏΠΎΠ΄ΠΊΠ»ΡŽΡ‡Π΅Π½ΠΈΡ, Ρ‡Ρ‚ΠΎΠ±Ρ‹ Π½Π΅ Π·Π°Π²ΠΈΡΠ°Ρ‚ΡŒ Π½Π° Π΄ΠΎΠ»Π³ΠΎΠΌ handshake. _original_connect = websockets.sync.client.connect +DEEPGRAM_CONNECT_TIMEOUT_SECONDS = 3.0 +DEEPGRAM_CONNECT_WAIT_SECONDS = 1.5 +DEEPGRAM_CONNECT_POLL_SECONDS = 0.001 + def _patched_connect(*args, **kwargs): - kwargs.setdefault("open_timeout", 30) - kwargs.setdefault("ping_timeout", 30) - kwargs.setdefault("close_timeout", 30) + # ΠŸΡ€ΠΈΠ½ΡƒΠ΄ΠΈΡ‚Π΅Π»ΡŒΠ½ΠΎ Π·Π°Π΄Π°Ρ‘ΠΌ ΠΊΠΎΡ€ΠΎΡ‚ΠΊΠΈΠ΅ Ρ‚Π°ΠΉΠΌΠ°ΡƒΡ‚Ρ‹, Π΄Π°ΠΆΠ΅ Ссли SDK ΠΏΠ΅Ρ€Π΅Π΄Π°Π» свои (Π½Π°ΠΏΡ€ΠΈΠΌΠ΅Ρ€, 30с). + kwargs["open_timeout"] = DEEPGRAM_CONNECT_TIMEOUT_SECONDS + kwargs["ping_timeout"] = DEEPGRAM_CONNECT_TIMEOUT_SECONDS + kwargs["close_timeout"] = DEEPGRAM_CONNECT_TIMEOUT_SECONDS print(f"DEBUG: Connecting to Deepgram with timeout={kwargs.get('open_timeout')}s") return _original_connect(*args, **kwargs) @@ -44,6 +48,34 @@ sdk_ws.connect = _patched_connect # ΠžΡ‚ΠΊΠ»ΡŽΡ‡Π°Π΅ΠΌ лишний мусор Π² Π»ΠΎΠ³Π°Ρ… logging.getLogger("deepgram").setLevel(logging.WARNING) +# Π‘Π°Π·ΠΎΠ²Ρ‹Π΅ ΠΏΠΎΡ€ΠΎΠ³ΠΈ для остановки STT +INITIAL_SILENCE_TIMEOUT_SECONDS = 5.0 +POST_SPEECH_SILENCE_TIMEOUT_SECONDS = 3.0 +# Π”Π»ΠΈΠ½Π½Ρ‹ΠΉ Π·Π°Ρ‰ΠΈΡ‚Π½Ρ‹ΠΉ ΠΏΡ€Π΅Π΄Π΅Π», Ρ‡Ρ‚ΠΎΠ±Ρ‹ Π½Π΅ ΠΎΠ±Ρ€Ρ‹Π²Π°Ρ‚ΡŒ ΠΎΠ±Ρ‹Ρ‡Π½ΡƒΡŽ Π΄Π»ΠΈΠ½Π½ΡƒΡŽ Ρ„Ρ€Π°Π·Ρƒ. +# ЀактичСскоС Π·Π°Π²Π΅Ρ€ΡˆΠ΅Π½ΠΈΠ΅ происходит ΠΏΠΎ 3 сСк Ρ‚ΠΈΡˆΠΈΠ½Ρ‹ послС Ρ€Π΅Ρ‡ΠΈ. +MAX_ACTIVE_SPEECH_SECONDS = 300.0 + +_FAST_STOP_UTTERANCE_RE = re.compile( + r"^(?:(?:алСксандр|алСсандр|alexander|alexandr)\s+)?" + r"(?:стоп|Ρ…Π²Π°Ρ‚ΠΈΡ‚|ΠΏΠ΅Ρ€Π΅ΡΡ‚Π°Π½ΡŒ|ΠΏΡ€Π΅ΠΊΡ€Π°Ρ‚ΠΈ|Π·Π°ΠΌΠΎΠ»Ρ‡ΠΈ|Ρ‚ΠΈΡ…ΠΎ|ΠΏΠ°ΡƒΠ·Π°)" + r"(?:\s+(?:поТалуйста|please))?$", + flags=re.IGNORECASE, +) + + +def _normalize_command_text(text: str) -> str: + normalized = text.lower().replace("Ρ‘", "Π΅") + normalized = re.sub(r"[^\w\s]+", " ", normalized, flags=re.UNICODE) + normalized = re.sub(r"\s+", " ", normalized, flags=re.UNICODE).strip() + return normalized + + +def _is_fast_stop_utterance(text: str) -> bool: + normalized = _normalize_command_text(text) + if not normalized: + return False + return _FAST_STOP_UTTERANCE_RE.fullmatch(normalized) is not None + class SpeechRecognizer: """Класс распознавания Ρ€Π΅Ρ‡ΠΈ Ρ‡Π΅Ρ€Π΅Π· Deepgram.""" @@ -105,24 +137,42 @@ class SpeechRecognizer: ) return self.stream - async def _process_audio(self, dg_connection, timeout_seconds, detection_timeout): + async def _process_audio( + self, dg_connection, timeout_seconds, detection_timeout, fast_stop + ): """ Асинхронная функция для ΠΎΡ‚ΠΏΡ€Π°Π²ΠΊΠΈ Π°ΡƒΠ΄ΠΈΠΎ ΠΈ получСния тСкста. Args: dg_connection: АктивноС соСдинСниС с Deepgram. - timeout_seconds: ΠžΠ±Ρ‰Π΅Π΅ врСмя ΠΏΡ€ΠΎΡΠ»ΡƒΡˆΠΈΠ²Π°Π½ΠΈΡ. + timeout_seconds: Аварийный Π»ΠΈΠΌΠΈΡ‚ Π΄Π»ΠΈΡ‚Π΅Π»ΡŒΠ½ΠΎΡΡ‚ΠΈ Π°ΠΊΡ‚ΠΈΠ²Π½ΠΎΠΉ Ρ€Π΅Ρ‡ΠΈ. detection_timeout: ВрСмя оТидания Π½Π°Ρ‡Π°Π»Π° Ρ€Π΅Ρ‡ΠΈ. + fast_stop: Если True, короткая стоп-Ρ„Ρ€Π°Π·Π° Π·Π°Π²Π΅Ρ€ΡˆΠ°Π΅Ρ‚ STT послС 1с Ρ‚ΠΈΡˆΠΈΠ½Ρ‹. """ self.transcript = "" transcript_parts = [] loop = asyncio.get_running_loop() stream = self._get_stream() + effective_detection_timeout = ( + detection_timeout + if detection_timeout is not None + else INITIAL_SILENCE_TIMEOUT_SECONDS + ) # Бобытия для синхронизации stop_event = asyncio.Event() # ΠŸΠΎΡ€Π° ΠΎΡΡ‚Π°Π½Π°Π²Π»ΠΈΠ²Π°Ρ‚ΡŒΡΡ speech_started_event = asyncio.Event() # Π Π΅Ρ‡ΡŒ ΠΎΠ±Π½Π°Ρ€ΡƒΠΆΠ΅Π½Π° (VAD) + last_speech_activity = time.monotonic() + first_speech_activity_at = None + + def mark_speech_activity(): + nonlocal last_speech_activity, first_speech_activity_at + now = time.monotonic() + last_speech_activity = now + if first_speech_activity_at is None: + first_speech_activity_at = now + speech_started_event.set() # --- ΠžΠ±Ρ€Π°Π±ΠΎΡ‚Ρ‡ΠΈΠΊΠΈ событий Deepgram --- def on_transcript(unused_self, result, **kwargs): @@ -130,6 +180,20 @@ class SpeechRecognizer: sentence = result.channel.alternatives[0].transcript if len(sentence) == 0: return + try: + loop.call_soon_threadsafe(mark_speech_activity) + except RuntimeError: + pass + + if fast_stop: + if _is_fast_stop_utterance(sentence): + self.transcript = sentence.strip() + try: + loop.call_soon_threadsafe(stop_event.set) + except RuntimeError: + pass + return + if result.is_final: # Π‘ΠΎΠ±ΠΈΡ€Π°Π΅ΠΌ Ρ‚ΠΎΠ»ΡŒΠΊΠΎ Ρ„ΠΈΠ½Π°Π»ΡŒΠ½Ρ‹Π΅ (ΠΏΠΎΠ΄Ρ‚Π²Π΅Ρ€ΠΆΠ΄Π΅Π½Π½Ρ‹Π΅) Ρ„Ρ€Π°Π·Ρ‹ transcript_parts.append(sentence) @@ -138,18 +202,16 @@ class SpeechRecognizer: def on_speech_started(unused_self, speech_started, **kwargs): """ВызываСтся, ΠΊΠΎΠ³Π΄Π° VAD (Voice Activity Detection) ΡΠ»Ρ‹ΡˆΠΈΡ‚ голос.""" try: - loop.call_soon_threadsafe(speech_started_event.set) + loop.call_soon_threadsafe(mark_speech_activity) except RuntimeError: # Event loop might be closed, ignore pass def on_utterance_end(unused_self, utterance_end, **kwargs): """ВызываСтся, ΠΊΠΎΠ³Π΄Π° Deepgram Ρ€Π΅ΡˆΠ°Π΅Ρ‚, Ρ‡Ρ‚ΠΎ Ρ„Ρ€Π°Π·Π° Π·Π°ΠΊΠΎΠ½Ρ‡ΠΈΠ»Π°ΡΡŒ (ΠΏΠ°ΡƒΠ·Π°).""" - try: - loop.call_soon_threadsafe(stop_event.set) - except RuntimeError: - # Event loop might be closed, ignore - pass + # НС останавливаСмся ΠΌΠ³Π½ΠΎΠ²Π΅Π½Π½ΠΎ Π½Π° событии Deepgram. + # ΠžΡΡ‚Π°Π½ΠΎΠ²ΠΊΠ° управляСтся Π»ΠΎΠΊΠ°Π»ΡŒΠ½Ρ‹ΠΌ ΠΏΠΎΡ€ΠΎΠ³ΠΎΠΌ Ρ‚ΠΈΡˆΠΈΠ½Ρ‹ POST_SPEECH_SILENCE_TIMEOUT_SECONDS. + return def on_error(unused_self, error, **kwargs): print(f"Deepgram Error: {error}") @@ -174,10 +236,10 @@ class SpeechRecognizer: channels=1, sample_rate=SAMPLE_RATE, interim_results=True, - utterance_end_ms=1000, # ΠŸΠ°ΡƒΠ·Π° 1.0с считаСтся ΠΊΠΎΠ½Ρ†ΠΎΠΌ Ρ„Ρ€Π°Π·Ρ‹ (Π±Ρ‹Π»ΠΎ 1.2) + utterance_end_ms=int(POST_SPEECH_SILENCE_TIMEOUT_SECONDS * 1000), vad_events=True, - # ДобавляСм ΠΏΠ°Ρ€Π°ΠΌΠ΅Ρ‚Ρ€Ρ‹ Ρ‚Π°ΠΉΠΌΠ°ΡƒΡ‚Π° для Π΄ΠΎΠ»Π³ΠΎΠΉ Ρ€Π°Π±ΠΎΡ‚Ρ‹ - endpointing=300, # Π’Π°ΠΉΠΌΠ°ΡƒΡ‚ Π² миллисСкундах для автоматичСского Π·Π°Π²Π΅Ρ€ΡˆΠ΅Π½ΠΈΡ + # Π‘Π³Π»Π°ΠΆΠ΅Π½Π½Ρ‹ΠΉ ΠΏΠΎΡ€ΠΎΠ³ endpointing, Ρ‡Ρ‚ΠΎΠ±Ρ‹ Π½Π΅ Ρ€Π΅Π·Π°Ρ‚ΡŒ Ρ€Π΅Ρ‡ΡŒ Π½Π° ΠΊΠΎΡ€ΠΎΡ‚ΠΊΠΈΡ… ΠΏΠ°ΡƒΠ·Π°Ρ…. + endpointing=int(POST_SPEECH_SILENCE_TIMEOUT_SECONDS * 1000), ) # --- Π—Π°Π΄Π°Ρ‡Π° ΠΎΡ‚ΠΏΡ€Π°Π²ΠΊΠΈ Π°ΡƒΠ΄ΠΈΠΎ с Π±ΡƒΡ„Π΅Ρ€ΠΈΠ·Π°Ρ†ΠΈΠ΅ΠΉ --- @@ -198,24 +260,29 @@ class SpeechRecognizer: None, lambda: dg_connection.start(options) ) - # Пока ΠΏΠΎΠ΄ΠΊΠ»ΡŽΡ‡Π°Π΅ΠΌΡΡ, ΠΊΠΎΠΏΠΈΠΌ Π΄Π°Π½Π½Ρ‹Π΅ - timeout_count = 0 - max_timeout = 5000 # МаксимальноС количСство ΠΈΡ‚Π΅Ρ€Π°Ρ†ΠΈΠΉ оТидания (ΠΎΠΊΠΎΠ»ΠΎ 2.5 сСкунд ΠΏΡ€ΠΈ 0.0005 Π·Π°Π΄Π΅Ρ€ΠΆΠΊΠ΅) - - while not connect_future.done() and timeout_count < max_timeout: + # Пока ΠΏΠΎΠ΄ΠΊΠ»ΡŽΡ‡Π°Π΅ΠΌΡΡ, ΠΊΠΎΠΏΠΈΠΌ Π΄Π°Π½Π½Ρ‹Π΅. + # Π–Π΄Ρ‘ΠΌ ΠΊΠΎΡ€ΠΎΡ‚ΠΊΠΎ: Ссли ΡΠ΅Ρ‚ΡŒ подвисла, быстрСС пСрСзапускаСм ΠΏΠΎΠΏΡ‹Ρ‚ΠΊΡƒ. + connect_deadline = time.monotonic() + DEEPGRAM_CONNECT_WAIT_SECONDS + while ( + not connect_future.done() + and time.monotonic() < connect_deadline + ): if stream.is_active(): data = stream.read(4096, exception_on_overflow=False) audio_buffer.append(data) - await asyncio.sleep(0.0005) # УмСньшаСм Π·Π°Π΄Π΅Ρ€ΠΆΠΊΡƒ для Π±ΠΎΠ»Π΅Π΅ быстрой ΠΎΠ±Ρ€Π°Π±ΠΎΡ‚ΠΊΠΈ - timeout_count += 1 + await asyncio.sleep(DEEPGRAM_CONNECT_POLL_SECONDS) - if timeout_count >= max_timeout: - print("⏰ Timeout connecting to Deepgram") + if not connect_future.done(): + print( + f"⏰ Timeout connecting to Deepgram ({DEEPGRAM_CONNECT_WAIT_SECONDS:.1f}s)" + ) + stop_event.set() return # ΠŸΡ€ΠΎΠ²Π΅Ρ€ΡΠ΅ΠΌ Ρ€Π΅Π·ΡƒΠ»ΡŒΡ‚Π°Ρ‚ ΠΏΠΎΠ΄ΠΊΠ»ΡŽΡ‡Π΅Π½ΠΈΡ if connect_future.result() is False: print("Failed to start Deepgram connection") + stop_event.set() return print(f"πŸš€ Connected! Sending buffer ({len(audio_buffer)} chunks)...") @@ -227,11 +294,8 @@ class SpeechRecognizer: audio_buffer = None # ОсвобоТдаСм ΠΏΠ°ΠΌΡΡ‚ΡŒ - # 4. ΠŸΡ€ΠΎΠ΄ΠΎΠ»ΠΆΠ°Π΅ΠΌ ΡΡ‚Ρ€ΠΈΠΌΠΈΡ‚ΡŒ Π² Ρ€Π΅Π°Π»ΡŒΠ½ΠΎΠΌ Π²Ρ€Π΅ΠΌΠ΅Π½ΠΈ - stream_timeout = 0 - max_stream_timeout = int(timeout_seconds / 0.002) # ΠŸΡ€ΠΈΠΌΠ΅Ρ€Π½Ρ‹ΠΉ Ρ‚Π°ΠΉΠΌΠ°ΡƒΡ‚ Π² зависимости ΠΎΡ‚ timeout_seconds - - while not stop_event.is_set() and stream_timeout < max_stream_timeout: + # 4. ΠŸΡ€ΠΎΠ΄ΠΎΠ»ΠΆΠ°Π΅ΠΌ ΡΡ‚Ρ€ΠΈΠΌΠΈΡ‚ΡŒ Π² Ρ€Π΅Π°Π»ΡŒΠ½ΠΎΠΌ Π²Ρ€Π΅ΠΌΠ΅Π½ΠΈ Π΄ΠΎ события остановки. + while not stop_event.is_set(): if stream.is_active(): data = stream.read(4096, exception_on_overflow=False) dg_connection.send(data) @@ -239,7 +303,6 @@ class SpeechRecognizer: if chunks_sent % 50 == 0: print(".", end="", flush=True) await asyncio.sleep(0.002) # УмСньшаСм Π·Π°Π΄Π΅Ρ€ΠΆΠΊΡƒ для Π±ΠΎΠ»Π΅Π΅ быстрого рСагирования - stream_timeout += 1 except Exception as e: print(f"Audio send error: {e}") @@ -255,19 +318,60 @@ class SpeechRecognizer: try: # 1. Π–Π΄Π΅ΠΌ Π½Π°Ρ‡Π°Π»Π° Ρ€Π΅Ρ‡ΠΈ (Ссли Π·Π°Π΄Π°Π½ detection_timeout) - if detection_timeout: + if ( + effective_detection_timeout + and effective_detection_timeout > 0 + and not stop_event.is_set() + ): + speech_wait_task = asyncio.create_task(speech_started_event.wait()) + stop_wait_task = asyncio.create_task(stop_event.wait()) try: - await asyncio.wait_for( - speech_started_event.wait(), timeout=detection_timeout + done, pending = await asyncio.wait( + {speech_wait_task, stop_wait_task}, + timeout=effective_detection_timeout, + return_when=asyncio.FIRST_COMPLETED, ) - except asyncio.TimeoutError: + finally: + for task in (speech_wait_task, stop_wait_task): + if not task.done(): + task.cancel() + await asyncio.gather( + speech_wait_task, stop_wait_task, return_exceptions=True + ) + + if not done: # Если Π·Π° detection_timeout Π½ΠΈΠΊΡ‚ΠΎ Π½Π΅ Π½Π°Ρ‡Π°Π» Π³ΠΎΠ²ΠΎΡ€ΠΈΡ‚ΡŒ, Π²Ρ‹Ρ…ΠΎΠ΄ΠΈΠΌ stop_event.set() - # 2. Если Ρ€Π΅Ρ‡ΡŒ Π½Π°Ρ‡Π°Π»Π°ΡΡŒ (ΠΈΠ»ΠΈ Ρ‚Π°ΠΉΠΌΠ°ΡƒΡ‚Π° Π½Π΅Ρ‚), ΠΆΠ΄Π΅ΠΌ Π·Π°Π²Π΅Ρ€ΡˆΠ΅Π½ΠΈΡ (stop_event) - # stop_event сработаСт Π»ΠΈΠ±ΠΎ ΠΏΠΎ UtteranceEnd (ΠΏΠ°ΡƒΠ·Π°), Π»ΠΈΠ±ΠΎ ΠΏΠΎ ΠΎΠ±Ρ‰Π΅ΠΌΡƒ Ρ‚Π°ΠΉΠΌΠ°ΡƒΡ‚Ρƒ + # 2. ПослС старта Ρ€Π΅Ρ‡ΠΈ Π·Π°Π²Π΅Ρ€ΡˆΠ°Π΅ΠΌ Ρ‚ΠΎΠ»ΡŒΠΊΠΎ ΠΏΠΎ Ρ‚ΠΈΡˆΠΈΠ½Π΅ POST_SPEECH_SILENCE_TIMEOUT_SECONDS. + # ДобавляСм Π΄Π»ΠΈΠ½Π½Ρ‹ΠΉ Π·Π°Ρ‰ΠΈΡ‚Π½Ρ‹ΠΉ Π»ΠΈΠΌΠΈΡ‚, Ρ‡Ρ‚ΠΎΠ±Ρ‹ сСссия Π½Π΅ зависла навсСгда. if not stop_event.is_set(): - await asyncio.wait_for(stop_event.wait(), timeout=timeout_seconds) + max_active_speech_seconds = max( + timeout_seconds if timeout_seconds else 0.0, + MAX_ACTIVE_SPEECH_SECONDS, + ) + + while not stop_event.is_set(): + now = time.monotonic() + + if speech_started_event.is_set(): + if ( + now - last_speech_activity + >= POST_SPEECH_SILENCE_TIMEOUT_SECONDS + ): + stop_event.set() + break + + if ( + first_speech_activity_at is not None + and now - first_speech_activity_at + >= max_active_speech_seconds + ): + print("⏱️ Достигнут Π·Π°Ρ‰ΠΈΡ‚Π½Ρ‹ΠΉ Π»ΠΈΠΌΠΈΡ‚ Π°ΠΊΡ‚ΠΈΠ²Π½ΠΎΠ³ΠΎ ΠΏΡ€ΠΎΡΠ»ΡƒΡˆΠΈΠ²Π°Π½ΠΈΡ.") + stop_event.set() + break + + await asyncio.sleep(0.05) except asyncio.TimeoutError: pass # ΠžΠ±Ρ‰ΠΈΠΉ Ρ‚Π°ΠΉΠΌΠ°ΡƒΡ‚ Π²Ρ‹ΡˆΠ΅Π» @@ -291,16 +395,18 @@ class SpeechRecognizer: def listen( self, timeout_seconds: float = 7.0, - detection_timeout: float = None, + detection_timeout: float = INITIAL_SILENCE_TIMEOUT_SECONDS, lang: str = "ru", + fast_stop: bool = False, ) -> str: """ Основной ΠΌΠ΅Ρ‚ΠΎΠ΄: ΡΠ»ΡƒΡˆΠ°Π΅Ρ‚ ΠΌΠΈΠΊΡ€ΠΎΡ„ΠΎΠ½ ΠΈ Π²ΠΎΠ·Π²Ρ€Π°Ρ‰Π°Π΅Ρ‚ тСкст. Args: - timeout_seconds: Максимальная Π΄Π»ΠΈΡ‚Π΅Π»ΡŒΠ½ΠΎΡΡ‚ΡŒ Ρ„Ρ€Π°Π·Ρ‹. + timeout_seconds: Π—Π°Ρ‰ΠΈΡ‚Π½Ρ‹ΠΉ Π»ΠΈΠΌΠΈΡ‚ Π΄Π»ΠΈΡ‚Π΅Π»ΡŒΠ½ΠΎΡΡ‚ΠΈ Π°ΠΊΡ‚ΠΈΠ²Π½ΠΎΠΉ Ρ€Π΅Ρ‡ΠΈ. detection_timeout: Бколько ΠΆΠ΄Π°Ρ‚ΡŒ Π½Π°Ρ‡Π°Π»Π° Ρ€Π΅Ρ‡ΠΈ ΠΏΠ΅Ρ€Π΅Π΄ Ρ‚Π΅ΠΌ ΠΊΠ°ΠΊ ΡΠ΄Π°Ρ‚ΡŒΡΡ. lang: Π―Π·Ρ‹ΠΊ ("ru" ΠΈΠ»ΠΈ "en"). + fast_stop: БыстроС Π·Π°Π²Π΅Ρ€ΡˆΠ΅Π½ΠΈΠ΅ для ΠΊΠΎΡ€ΠΎΡ‚ΠΊΠΈΡ… stop-ΠΊΠΎΠΌΠ°Π½Π΄. """ if not self.dg_client: self.initialize() @@ -323,7 +429,7 @@ class SpeechRecognizer: # ЗапускаСм асинхронный процСсс ΠΎΠ±Ρ€Π°Π±ΠΎΡ‚ΠΊΠΈ transcript = asyncio.run( self._process_audio( - dg_connection, timeout_seconds, detection_timeout + dg_connection, timeout_seconds, detection_timeout, fast_stop ) ) final_text = transcript.strip() if transcript else "" @@ -389,10 +495,13 @@ def get_recognizer() -> SpeechRecognizer: def listen( - timeout_seconds: float = 7.0, detection_timeout: float = None, lang: str = "ru" + timeout_seconds: float = 7.0, + detection_timeout: float = INITIAL_SILENCE_TIMEOUT_SECONDS, + lang: str = "ru", + fast_stop: bool = False, ) -> str: """Π’Π½Π΅ΡˆΠ½ΡΡ функция для ΠΏΡ€ΠΎΡΠ»ΡƒΡˆΠΈΠ²Π°Π½ΠΈΡ.""" - return get_recognizer().listen(timeout_seconds, detection_timeout, lang) + return get_recognizer().listen(timeout_seconds, detection_timeout, lang, fast_stop) def cleanup(): diff --git a/app/audio/wakeword.py b/app/audio/wakeword.py index 8efbcea..fc12ce6 100644 --- a/app/audio/wakeword.py +++ b/app/audio/wakeword.py @@ -9,7 +9,11 @@ Listens for the "Alexandr" wake word. import pvporcupine import pyaudio import struct -from ..core.config import PORCUPINE_ACCESS_KEY, PORCUPINE_KEYWORD_PATH +from ..core.config import ( + PORCUPINE_ACCESS_KEY, + PORCUPINE_KEYWORD_PATH, + PORCUPINE_SENSITIVITY, +) from ..core.audio_manager import get_audio_manager @@ -27,13 +31,15 @@ class WakeWordDetector: """Π˜Π½ΠΈΡ†ΠΈΠ°Π»ΠΈΠ·Π°Ρ†ΠΈΡ Porcupine ΠΈ PyAudio.""" # Π‘ΠΎΠ·Π΄Π°Π΅ΠΌ экзСмпляр Porcupine с нашим ΠΊΠ»ΡŽΡ‡ΠΎΠΌ доступа ΠΈ Ρ„Π°ΠΉΠ»ΠΎΠΌ ΠΌΠΎΠ΄Π΅Π»ΠΈ (.ppn) self.porcupine = pvporcupine.create( - access_key=PORCUPINE_ACCESS_KEY, keyword_paths=[str(PORCUPINE_KEYWORD_PATH)] + access_key=PORCUPINE_ACCESS_KEY, + keyword_paths=[str(PORCUPINE_KEYWORD_PATH)], + sensitivities=[PORCUPINE_SENSITIVITY], ) # Π˜ΡΠΏΠΎΠ»ΡŒΠ·ΡƒΠ΅ΠΌ ΠΎΠ±Ρ‰ΠΈΠΉ экзСмпляр PyAudio self.pa = get_audio_manager().get_pyaudio() self._open_stream() - print("🎀 ОТиданиС wake word 'Alexandr'...") + print(f"🎀 ОТиданиС wake word 'Alexandr' (sens={PORCUPINE_SENSITIVITY:.2f})...") def _open_stream(self): """ΠžΡ‚ΠΊΡ€Ρ‹Ρ‚ΠΈΠ΅ Π°ΡƒΠ΄ΠΈΠΎΠΏΠΎΡ‚ΠΎΠΊΠ° с ΠΌΠΈΠΊΡ€ΠΎΡ„ΠΎΠ½Π°.""" diff --git a/app/core/cleaner.py b/app/core/cleaner.py index 58fb2e7..659be73 100644 --- a/app/core/cleaner.py +++ b/app/core/cleaner.py @@ -12,6 +12,7 @@ Handles complex number-to-text conversion for Russian language. import re import pymorphy3 from num2words import num2words +from .roman import roman_to_int # Π˜Π½ΠΈΡ†ΠΈΠ°Π»ΠΈΠ·Π°Ρ†ΠΈΡ морфологичСского Π°Π½Π°Π»ΠΈΠ·Π°Ρ‚ΠΎΡ€Π° (для опрСдСлСния ΠΏΠ°Π΄Π΅ΠΆΠ΅ΠΉ) morph = pymorphy3.MorphAnalyzer() @@ -334,6 +335,50 @@ def numbers_to_words(text: str) -> str: return text +def roman_numerals_to_words(text: str) -> str: + """ + ΠŸΡ€Π΅ΠΎΠ±Ρ€Π°Π·ΡƒΠ΅Ρ‚ римскиС Ρ†ΠΈΡ„Ρ€Ρ‹ Π² порядковыС Ρ‡ΠΈΡΠ»ΠΈΡ‚Π΅Π»ΡŒΠ½Ρ‹Π΅ с ΡƒΡ‡Π΅Ρ‚ΠΎΠΌ + ΠΌΠΎΡ€Ρ„ΠΎΠ»ΠΎΠ³ΠΈΠΈ ΠΏΡ€Π΅Π΄Ρ‹Π΄ΡƒΡ‰Π΅Π³ΠΎ слова. + ΠŸΡ€ΠΈΠΌΠ΅Ρ€: "Ивана III" -> "Ивана Ρ‚Ρ€Π΅Ρ‚ΡŒΠ΅Π³ΠΎ". + """ + if not text: + return "" + + def replace_roman_match(match): + prev_word = match.group(1) + roman = match.group(2) + + number = roman_to_int(roman) + if number is None: + return match.group(0) + + case = "nominative" + gender = "m" + + try: + parsed = morph.parse(prev_word)[0] + case_tag = parsed.tag.case + gender_tag = parsed.tag.gender + + if case_tag: + case = PYMORPHY_TO_NUM2WORDS.get(case_tag, "nominative") + if gender_tag: + gender = PYMORPHY_TO_GENDER.get(gender_tag, "m") + except Exception: + pass + + ordinal = convert_number( + str(number), context_type="ordinal", case=case, gender=gender + ) + return f"{prev_word} {ordinal}" + + return re.sub( + r"(?i)\b([А-Π―Π°-яЁё]+)\s+([IVXLCDM]+)\b", + replace_roman_match, + text, + ) + + def clean_response(text: str, language: str = "ru") -> str: """ Основная функция очистки. @@ -408,9 +453,11 @@ def clean_response(text: str, language: str = "ru") -> str: flags=re.IGNORECASE | re.MULTILINE, ) - # Convert numbers to words only for Russian, and only if digits exist - if language == "ru" and re.search(r"\d", text): - text = numbers_to_words(text) + # Convert Roman numerals and Arabic digits to words for Russian. + if language == "ru": + text = roman_numerals_to_words(text) + if re.search(r"\d", text): + text = numbers_to_words(text) # Remove extra whitespace text = re.sub(r"\n{3,}", "\n\n", text) diff --git a/app/core/config.py b/app/core/config.py index e520126..5947576 100644 --- a/app/core/config.py +++ b/app/core/config.py @@ -33,6 +33,8 @@ DEEPGRAM_API_KEY = os.getenv("DEEPGRAM_API_KEY") PORCUPINE_ACCESS_KEY = os.getenv("PORCUPINE_ACCESS_KEY") # ΠŸΡƒΡ‚ΡŒ ΠΊ Ρ„Π°ΠΉΠ»Ρƒ ΠΌΠΎΠ΄Π΅Π»ΠΈ ΠΊΠ»ΡŽΡ‡Π΅Π²ΠΎΠ³ΠΎ слова (.ppn), ΠΊΠΎΡ‚ΠΎΡ€Ρ‹ΠΉ Π»Π΅ΠΆΠΈΡ‚ Π² ΠΏΠ°ΠΏΠΊΠ΅ assets/models PORCUPINE_KEYWORD_PATH = BASE_DIR / "assets" / "models" / "Alexandr_en_linux_v4_0_0.ppn" +# Π§ΡƒΠ²ΡΡ‚Π²ΠΈΡ‚Π΅Π»ΡŒΠ½ΠΎΡΡ‚ΡŒ wake word (0..1). Π’Ρ‹ΡˆΠ΅ = Π»ΠΎΠ²ΠΈΡ‚ Π»Π΅Π³Ρ‡Π΅, Π½ΠΎ большС Π»ΠΎΠΆΠ½Ρ‹Ρ… срабатываний. +PORCUPINE_SENSITIVITY = float(os.getenv("PORCUPINE_SENSITIVITY", "0.8")) # --- ΠŸΠ°Ρ€Π°ΠΌΠ΅Ρ‚Ρ€Ρ‹ Π°ΡƒΠ΄ΠΈΠΎ --- # Частота дискрСтизации для ΠΌΠΈΠΊΡ€ΠΎΡ„ΠΎΠ½Π° (стандарт для распознавания Ρ€Π΅Ρ‡ΠΈ) diff --git a/app/core/roman.py b/app/core/roman.py new file mode 100644 index 0000000..ade8155 --- /dev/null +++ b/app/core/roman.py @@ -0,0 +1,43 @@ +"""Roman numeral parsing helpers.""" + +import re + +_ROMAN_VALID_RE = re.compile( + r"^M{0,3}(CM|CD|D?C{0,3})" + r"(XC|XL|L?X{0,3})" + r"(IX|IV|V?I{0,3})$" +) +_ROMAN_TOKEN_RE = re.compile(r"(? int | None: + if not token: + return None + + roman = token.strip().upper() + if not roman or not _ROMAN_VALID_RE.fullmatch(roman): + return None + + total = 0 + prev = 0 + for char in reversed(roman): + value = _ROMAN_VALUES[char] + if value < prev: + total -= value + else: + total += value + prev = value + return total + + +def replace_roman_numerals(text: str) -> str: + if not text: + return text + + def _repl(match: re.Match) -> str: + token = match.group(0) + value = roman_to_int(token) + return str(value) if value is not None else token + + return _ROMAN_TOKEN_RE.sub(_repl, text) diff --git a/app/features/alarm.py b/app/features/alarm.py index 5fca5a3..b7cc29f 100644 --- a/app/features/alarm.py +++ b/app/features/alarm.py @@ -10,11 +10,13 @@ from datetime import datetime from ..core.config import BASE_DIR from ..audio.stt import listen from ..core.commands import is_stop_command +from ..core.roman import replace_roman_numerals # Π€Π°ΠΉΠ» Π±Π°Π·Ρ‹ Π΄Π°Π½Π½Ρ‹Ρ… Π±ΡƒΠ΄ΠΈΠ»ΡŒΠ½ΠΈΠΊΠΎΠ² ALARM_FILE = BASE_DIR / "data" / "alarms.json" # Π—Π²ΡƒΠΊΠΎΠ²ΠΎΠΉ Ρ„Π°ΠΉΠ» сигнала ALARM_SOUND = BASE_DIR / "assets" / "sounds" / "Apex-1.mp3" +ASK_ALARM_TIME_PROMPT = "На ΠΊΠ°ΠΊΠΎΠ΅ врСмя ΠΌΠ½Π΅ ΠΏΠΎΡΡ‚Π°Π²ΠΈΡ‚ΡŒ Π±ΡƒΠ΄ΠΈΠ»ΡŒΠ½ΠΈΠΊ?" class AlarmClock: @@ -229,7 +231,7 @@ class AlarmClock: try: # Π¦ΠΈΠΊΠ» оТидания стоп-ΠΊΠΎΠΌΠ°Π½Π΄Ρ‹ while True: - text = listen(timeout_seconds=3.0, detection_timeout=3.0) + text = listen(timeout_seconds=3.0, detection_timeout=3.0, fast_stop=True) if text: if is_stop_command(text, mode="lenient"): print(f"πŸ›‘ Π‘ΡƒΠ΄ΠΈΠ»ΡŒΠ½ΠΈΠΊ остановлСн ΠΏΠΎ ΠΊΠΎΠΌΠ°Π½Π΄Π΅: '{text}'") @@ -251,7 +253,7 @@ class AlarmClock: ΠŸΠ°Ρ€ΡΠΈΠ½Π³ ΠΊΠΎΠΌΠ°Π½Π΄Ρ‹ установки Π±ΡƒΠ΄ΠΈΠ»ΡŒΠ½ΠΈΠΊΠ° ΠΈΠ· тСкста. ΠŸΡ€ΠΈΠΌΠ΅Ρ€Ρ‹: "Ρ€Π°Π·Π±ΡƒΠ΄ΠΈ Π² 7:30", "Π±ΡƒΠ΄ΠΈΠ»ΡŒΠ½ΠΈΠΊ Π½Π° 8 ΡƒΡ‚Ρ€Π°". """ - text = text.lower() + text = replace_roman_numerals(text.lower()) if "Π±ΡƒΠ΄ΠΈΠ»ΡŒΠ½ΠΈΠΊ" not in text and "Ρ€Π°Π·Π±ΡƒΠ΄ΠΈ" not in text: return None @@ -299,6 +301,12 @@ class AlarmClock: suffix = f" {days_phrase}" if days_phrase else "" return f"Π₯ΠΎΡ€ΠΎΡˆΠΎ, Ρ€Π°Π·Π±ΡƒΠΆΡƒ вас Π² {h}:{m:02d}{suffix}." + if re.search(r"(постав|установ|запусти|Π²ΠΊΠ»ΡŽΡ‡ΠΈ|Ρ€Π°Π·Π±ΡƒΠ΄ΠΈ)", text) or text.strip() in { + "Π±ΡƒΠ΄ΠΈΠ»ΡŒΠ½ΠΈΠΊ", + "ΠΏΠΎΡΡ‚Π°Π²ΡŒ Π±ΡƒΠ΄ΠΈΠ»ΡŒΠ½ΠΈΠΊ", + }: + return ASK_ALARM_TIME_PROMPT + return "Π― Π½Π΅ понял врСмя для Π±ΡƒΠ΄ΠΈΠ»ΡŒΠ½ΠΈΠΊΠ°. ΠŸΠΎΠΆΠ°Π»ΡƒΠΉΡΡ‚Π°, скаТитС Ρ‚ΠΎΡ‡Π½ΠΎΠ΅ врСмя, Π½Π°ΠΏΡ€ΠΈΠΌΠ΅Ρ€ 'сСмь Ρ‚Ρ€ΠΈΠ΄Ρ†Π°Ρ‚ΡŒ'." diff --git a/app/features/music.py b/app/features/music.py index 254032a..54ff668 100644 --- a/app/features/music.py +++ b/app/features/music.py @@ -9,6 +9,7 @@ Spotify Music Controller - "ΡΠ»Π΅Π΄ΡƒΡŽΡ‰ΠΈΠΉ Ρ‚Ρ€Π΅ΠΊ" / "next" - ΡΠ»Π΅Π΄ΡƒΡŽΡ‰ΠΈΠΉ Ρ‚Ρ€Π΅ΠΊ - "ΠΏΡ€Π΅Π΄Ρ‹Π΄ΡƒΡ‰ΠΈΠΉ Ρ‚Ρ€Π΅ΠΊ" / "previous" - ΠΏΡ€Π΅Π΄Ρ‹Π΄ΡƒΡ‰ΠΈΠΉ Ρ‚Ρ€Π΅ΠΊ - "Ρ‡Ρ‚ΠΎ ΠΈΠ³Ρ€Π°Π΅Ρ‚" / "какая пСсня" - информация ΠΎ Ρ‚Π΅ΠΊΡƒΡ‰Π΅ΠΌ Ρ‚Ρ€Π΅ΠΊΠ΅ +- "ΡƒΠ³Π°Π΄Π°ΠΉ пСсню" / "распознай ΠΌΡƒΠ·Ρ‹ΠΊΡƒ" - распознаваниС Ρ‚Π΅ΠΊΡƒΡ‰Π΅Π³ΠΎ Ρ‚Ρ€Π΅ΠΊΠ° """ import os @@ -287,6 +288,16 @@ class SpotifyMusicController: if re.search(pattern, text_lower) and ("Ρ‚Ρ€Π΅ΠΊ" in text_lower or "пСсн" in text_lower or "previous" in text_lower or "back" in text_lower): return self.previous_track() + # Π―Π²Π½Ρ‹Π΅ ΠΊΠΎΠΌΠ°Π½Π΄Ρ‹ распознавания ΠΌΡƒΠ·Ρ‹ΠΊΠΈ (Ρ‚ΠΈΠΏΠ° "ΡƒΠ³Π°Π΄Π°ΠΉ пСсню") + recognize_patterns = [ + r"((алСксандр|алСксандра|алСсандр|alexander)\s+)?(ΡƒΠ³Π°Π΄Π°ΠΉ|распознай|ΠΎΠΏΡ€Π΅Π΄Π΅Π»ΠΈ)\s+(ΠΌΠ΅Π»ΠΎΠ΄|ΠΌΡƒΠ·Ρ‹ΠΊ|пСсн|Ρ‚Ρ€Π΅ΠΊ)", + r"((алСксандр|алСксандра|алСсандр|alexander)\s+)?(Ρ‡Ρ‚ΠΎ Π·Π°|какая это)\s+(ΠΌΡƒΠ·Ρ‹ΠΊ|пСсн|Ρ‚Ρ€Π΅ΠΊ)", + r"(identify|recognize)\s+(song|music|track)", + ] + for pattern in recognize_patterns: + if re.search(pattern, text_lower): + return self.get_current_track() + # Π§Ρ‚ΠΎ ΠΈΠ³Ρ€Π°Π΅Ρ‚ current_patterns = [ r"(Ρ‡Ρ‚ΠΎ (сСйчас )?ΠΈΠ³Ρ€Π°Π΅Ρ‚|ΠΊΠ°ΠΊ(ая|ΠΎΠΉ) (пСсня|Ρ‚Ρ€Π΅ΠΊ)|Ρ‡Ρ‚ΠΎ Π·Π° (пСсня|Ρ‚Ρ€Π΅ΠΊ|ΠΌΡƒΠ·Ρ‹ΠΊΠ°))", diff --git a/app/features/stopwatch.py b/app/features/stopwatch.py new file mode 100644 index 0000000..b8716f0 --- /dev/null +++ b/app/features/stopwatch.py @@ -0,0 +1,267 @@ +"""Stopwatch module.""" + +import json +import re +from datetime import datetime + +from ..core.config import BASE_DIR + +STOPWATCH_FILE = BASE_DIR / "data" / "stopwatches.json" + + +# Optional ordinal formatting for list numbering. +try: + from num2words import num2words +except Exception: + num2words = None + + +def _format_ordinal_index(index: int) -> str: + if num2words is None: + return f"{index}-ΠΉ" + try: + return num2words(index, lang="ru", to="ordinal", case="nominative", gender="m") + except Exception: + return f"{index}-ΠΉ" + + +def _format_duration(seconds: float) -> str: + total = int(round(max(0, seconds))) + hours = total // 3600 + minutes = (total % 3600) // 60 + sec = total % 60 + + parts = [] + if hours: + parts.append(f"{hours} Ρ‡") + if minutes: + parts.append(f"{minutes} ΠΌΠΈΠ½") + parts.append(f"{sec} сСк") + return " ".join(parts) + + +class StopwatchManager: + def __init__(self): + self.stopwatches = [] + self.load_stopwatches() + + def load_stopwatches(self): + if not STOPWATCH_FILE.exists(): + return + try: + with open(STOPWATCH_FILE, "r", encoding="utf-8") as f: + raw = json.load(f) + except Exception as e: + print(f"❌ Ошибка Π·Π°Π³Ρ€ΡƒΠ·ΠΊΠΈ сСкундомСров: {e}") + return + + items = [] + for item in raw: + try: + stopwatch_id = int(item["id"]) + except Exception: + continue + items.append( + { + "id": stopwatch_id, + "name": str(item.get("name", "")).strip(), + "elapsed": float(item.get("elapsed", 0)), + "running": bool(item.get("running", False)), + "started_at": item.get("started_at"), + } + ) + self.stopwatches = sorted(items, key=lambda x: x["id"]) + + def save_stopwatches(self): + payload = [ + { + "id": sw["id"], + "name": sw.get("name", ""), + "elapsed": sw.get("elapsed", 0), + "running": sw.get("running", False), + "started_at": sw.get("started_at"), + } + for sw in self.stopwatches + ] + try: + with open(STOPWATCH_FILE, "w", encoding="utf-8") as f: + json.dump(payload, f, indent=4) + except Exception as e: + print(f"❌ Ошибка сохранСния сСкундомСров: {e}") + + def _next_id(self) -> int: + if not self.stopwatches: + return 1 + return max(sw["id"] for sw in self.stopwatches) + 1 + + def _now_iso(self) -> str: + return datetime.now().isoformat() + + def _elapsed_now(self, stopwatch: dict) -> float: + elapsed = float(stopwatch.get("elapsed", 0)) + if not stopwatch.get("running"): + return elapsed + + started_at = stopwatch.get("started_at") + if not started_at: + return elapsed + + try: + started_dt = datetime.fromisoformat(started_at) + except Exception: + return elapsed + + delta = (datetime.now() - started_dt).total_seconds() + return elapsed + max(0, delta) + + def _running(self): + return [sw for sw in self.stopwatches if sw.get("running")] + + def _paused(self): + return [sw for sw in self.stopwatches if not sw.get("running")] + + def has_running_stopwatches(self) -> bool: + return bool(self._running()) + + def describe_active_stopwatches(self) -> str: + running = self._running() + if not running: + return "Активных сСкундомСров Π½Π΅Ρ‚." + + running.sort(key=lambda sw: sw["id"]) + items = [] + for idx, sw in enumerate(running, start=1): + ordinal = _format_ordinal_index(idx) + duration = _format_duration(self._elapsed_now(sw)) + name = sw.get("name", "") + if name: + items.append(f"{ordinal}) {name} β€” {duration}") + else: + items.append(f"{ordinal}) {duration}") + return "АктивныС сСкундомСры: " + "; ".join(items) + "." + + def start_stopwatch(self, name: str = "") -> str: + stopwatch = { + "id": self._next_id(), + "name": name.strip(), + "elapsed": 0.0, + "running": True, + "started_at": self._now_iso(), + } + self.stopwatches.append(stopwatch) + self.save_stopwatches() + if stopwatch["name"]: + return f"Запустил сСкундомСр Β«{stopwatch['name']}Β»." + return "Запустил сСкундомСр." + + def pause_stopwatches(self) -> str: + running = self._running() + if not running: + return "БСйчас Π½Π΅Ρ‚ Π°ΠΊΡ‚ΠΈΠ²Π½Ρ‹Ρ… сСкундомСров." + + elapsed_items = [] + for sw in running: + elapsed_now = self._elapsed_now(sw) + elapsed_items.append( + { + "id": sw["id"], + "name": sw.get("name", ""), + "elapsed": elapsed_now, + } + ) + sw["elapsed"] = elapsed_now + sw["running"] = False + sw["started_at"] = None + + self.save_stopwatches() + count = len(running) + if count == 1: + elapsed_text = _format_duration(elapsed_items[0]["elapsed"]) + return f"ΠžΡΡ‚Π°Π½ΠΎΠ²ΠΈΠ» сСкундомСр. Он Ρ€Π°Π±ΠΎΡ‚Π°Π» {elapsed_text}." + + details = [] + for idx, item in enumerate(sorted(elapsed_items, key=lambda x: x["id"]), start=1): + ordinal = _format_ordinal_index(idx) + elapsed_text = _format_duration(item["elapsed"]) + name = item.get("name", "") + if name: + details.append(f"{ordinal} Β«{name}Β» β€” {elapsed_text}") + else: + details.append(f"{ordinal} β€” {elapsed_text}") + return f"ΠžΡΡ‚Π°Π½ΠΎΠ²ΠΈΠ» сСкундомСры: {count} ΡˆΡ‚. ВрСмя: " + "; ".join(details) + "." + + def resume_stopwatches(self) -> str: + paused = self._paused() + if not paused: + return "ΠŸΠ°ΡƒΠ·Π° Π½Π΅ Π°ΠΊΡ‚ΠΈΠ²Π½Π°: сСкундомСры ΡƒΠΆΠ΅ Π·Π°ΠΏΡƒΡ‰Π΅Π½Ρ‹ ΠΈΠ»ΠΈ ΠΎΡ‚ΡΡƒΡ‚ΡΡ‚Π²ΡƒΡŽΡ‚." + + for sw in paused: + sw["running"] = True + sw["started_at"] = self._now_iso() + + self.save_stopwatches() + count = len(paused) + if count == 1: + return "ΠŸΡ€ΠΎΠ΄ΠΎΠ»ΠΆΠΈΠ» сСкундомСр." + return f"ΠŸΡ€ΠΎΠ΄ΠΎΠ»ΠΆΠΈΠ» сСкундомСры: {count} ΡˆΡ‚." + + def reset_stopwatches(self) -> str: + if not self.stopwatches: + return "Π‘Π΅ΠΊΡƒΠ½Π΄ΠΎΠΌΠ΅Ρ€ΠΎΠ² для сброса Π½Π΅Ρ‚." + + count = len(self.stopwatches) + self.stopwatches = [] + self.save_stopwatches() + if count == 1: + return "Π‘Π΅ΠΊΡƒΠ½Π΄ΠΎΠΌΠ΅Ρ€ ΡΠ±Ρ€ΠΎΡˆΠ΅Π½." + return f"Ббросил сСкундомСры: {count} ΡˆΡ‚." + + def parse_command(self, text: str) -> str | None: + text = text.lower().strip() + + has_stopwatch_word = any( + word in text + for word in [ + "сСкундомСр", + "сСкундомСры", + "сСкундомСром", + "сСкундомСра", + "сСкундомСру", + ] + ) + if not has_stopwatch_word: + return None + + if re.search(r"(ΠΊΠ°ΠΊΠΈΠ΅|ΠΊΠ°ΠΊΠΎΠΉ|список|Π°ΠΊΡ‚ΠΈΠ²Π½|ΠΏΠΎΠΊΠ°ΠΆΠΈ|сколько|Π΅ΡΡ‚ΡŒ Π»ΠΈ)", text): + return self.describe_active_stopwatches() + + if any(word in text for word in ["сброс", "ΡƒΠ΄Π°Π»ΠΈ", "ΠΎΡ‚ΠΌΠ΅Π½ΠΈ", "очист"]): + return self.reset_stopwatches() + + if any(word in text for word in ["ΠΏΡ€ΠΎΠ΄ΠΎΠ»ΠΆ", "Π²ΠΎΠ·ΠΎΠ±Π½ΠΎΠ²"]): + return self.resume_stopwatches() + + if any(word in text for word in ["стоп", "останов", "ΠΏΠ°ΡƒΠ·Π°"]): + return self.pause_stopwatches() + + if "постав" in text or "установ" in text: + return self.start_stopwatch() + + if any(word in text for word in ["запусти", "Π²ΠΊΠ»ΡŽΡ‡ΠΈ", "старт", "Π½Π°Ρ‡Π½ΠΈ"]): + return self.start_stopwatch() + + # Если ΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Ρ‚Π΅Π»ΡŒ просто сказал "сСкундомСр", Ρ‚Ρ€Π°ΠΊΡ‚ΡƒΠ΅ΠΌ ΠΊΠ°ΠΊ запуск. + if text in {"сСкундомСр", "запусти сСкундомСр", "Π²ΠΊΠ»ΡŽΡ‡ΠΈ сСкундомСр"}: + return self.start_stopwatch() + + return "Π― понял ΠΊΠΎΠΌΠ°Π½Π΄Ρƒ ΠΏΡ€ΠΎ сСкундомСр, Π½ΠΎ Π½Π΅ распознал дСйствиС. Π‘ΠΊΠ°ΠΆΠΈΡ‚Π΅: запусти, стоп, ΠΏΡ€ΠΎΠ΄ΠΎΠ»ΠΆΠΈ, ΡΠ±Ρ€ΠΎΡΡŒ ΠΈΠ»ΠΈ ΠΏΠΎΠΊΠ°ΠΆΠΈ Π°ΠΊΡ‚ΠΈΠ²Π½Ρ‹Π΅ сСкундомСры." + + +_stopwatch_manager = None + + +def get_stopwatch_manager(): + global _stopwatch_manager + if _stopwatch_manager is None: + _stopwatch_manager = StopwatchManager() + return _stopwatch_manager diff --git a/app/features/timer.py b/app/features/timer.py index f5f2fd1..3d5c23f 100644 --- a/app/features/timer.py +++ b/app/features/timer.py @@ -10,6 +10,7 @@ from datetime import datetime, timedelta from ..core.config import BASE_DIR from ..audio.stt import listen from ..core.commands import is_stop_command +from ..core.roman import replace_roman_numerals # Morphological analysis for better recognition of number words. try: @@ -22,6 +23,7 @@ except Exception: # Π—Π²ΡƒΠΊΠΎΠ²ΠΎΠΉ Ρ„Π°ΠΉΠ» сигнала (ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΡƒΠ΅ΠΌ Ρ‚ΠΎΡ‚ ΠΆΠ΅, Ρ‡Ρ‚ΠΎ ΠΈ для Π±ΡƒΠ΄ΠΈΠ»ΡŒΠ½ΠΈΠΊΠ°) ALARM_SOUND = BASE_DIR / "assets" / "sounds" / "Apex-1.mp3" TIMER_FILE = BASE_DIR / "data" / "timers.json" +ASK_TIMER_TIME_PROMPT = "На ΠΊΠ°ΠΊΠΎΠ΅ врСмя ΠΌΠ½Π΅ ΠΏΠΎΡΡ‚Π°Π²ΠΈΡ‚ΡŒ Ρ‚Π°ΠΉΠΌΠ΅Ρ€?" # --- Number words parsing helpers (ru) --- _NUMBER_UNITS = { @@ -162,11 +164,13 @@ def _parse_number_lemmas(lemmas): def _normalize_timer_text(text: str) -> str: # Split "полчаса/ΠΏΠΎΠ»ΠΌΠΈΠ½ΡƒΡ‚Ρ‹/полсСкунды" into "ΠΏΠΎΠ» часа" for easier parsing. - return re.sub( + text = re.sub( r"(?i)\bΠΏΠΎΠ»(?=(?:час|часа|ΠΌΠΈΠ½ΡƒΡ‚|ΠΌΠΈΠ½ΡƒΡ‚Ρ‹|ΠΌΠΈΠ½ΡƒΡ‚Ρƒ|сСкунд|сСкунды|сСкунду|ΠΌΠΈΠ½|сСк)\b)", "ΠΏΠΎΠ» ", text, ) + # Support commands like "Ρ‚Π°ΠΉΠΌΠ΅Ρ€ Π½Π° X ΠΌΠΈΠ½ΡƒΡ‚". + return replace_roman_numerals(text) def _find_word_number_before_unit(tokens, unit_index): @@ -371,7 +375,7 @@ class TimerManager: try: # Π¦ΠΈΠΊΠ» оТидания стоп-ΠΊΠΎΠΌΠ°Π½Π΄Ρ‹ while True: - text = listen(timeout_seconds=3.0, detection_timeout=3.0) + text = listen(timeout_seconds=3.0, detection_timeout=3.0, fast_stop=True) if text: if is_stop_command(text, mode="lenient"): print(f"πŸ›‘ Π’Π°ΠΉΠΌΠ΅Ρ€ остановлСн ΠΏΠΎ ΠΊΠΎΠΌΠ°Π½Π΄Π΅: '{text}'") @@ -477,7 +481,14 @@ class TimerManager: self.add_timer(total_seconds, label) return f"ΠŸΠΎΡΡ‚Π°Π²ΠΈΠ» Ρ‚Π°ΠΉΠΌΠ΅Ρ€ Π½Π° {label}." - # Если сказали "Ρ‚Π°ΠΉΠΌΠ΅Ρ€", Π½ΠΎ Π½Π΅ нашли врСмя + # Если попросили ΠΏΠΎΡΡ‚Π°Π²ΠΈΡ‚ΡŒ Ρ‚Π°ΠΉΠΌΠ΅Ρ€, Π½ΠΎ Π½Π΅ Π½Π°Π·Π²Π°Π»ΠΈ врСмя β€” Π·Π°Π΄Π°Π΅ΠΌ ΡƒΡ‚ΠΎΡ‡Π½ΡΡŽΡ‰ΠΈΠΉ вопрос. + if re.search(r"(постав|установ|запусти|Π²ΠΊΠ»ΡŽΡ‡ΠΈ|засСки)", text) or text.strip() in { + "Ρ‚Π°ΠΉΠΌΠ΅Ρ€", + "ΠΏΠΎΡΡ‚Π°Π²ΡŒ Ρ‚Π°ΠΉΠΌΠ΅Ρ€", + }: + return ASK_TIMER_TIME_PROMPT + + # Если сказали "Ρ‚Π°ΠΉΠΌΠ΅Ρ€", Π½ΠΎ Π½Π΅ нашли врСмя. return "Π― Π½Π΅ понял, Π½Π° сколько ΠΏΠΎΡΡ‚Π°Π²ΠΈΡ‚ΡŒ Ρ‚Π°ΠΉΠΌΠ΅Ρ€. Π‘ΠΊΠ°ΠΆΠΈΡ‚Π΅, Π½Π°ΠΏΡ€ΠΈΠΌΠ΅Ρ€, 'Ρ‚Π°ΠΉΠΌΠ΅Ρ€ Π½Π° 5 ΠΌΠΈΠ½ΡƒΡ‚'." diff --git a/app/main.py b/app/main.py index 08cfcc3..1800cf1 100644 --- a/app/main.py +++ b/app/main.py @@ -53,8 +53,9 @@ from .core.config import BASE_DIR from .core.cleaner import clean_response from .core.commands import is_stop_command from .core.smalltalk import get_smalltalk_response -from .features.alarm import get_alarm_clock -from .features.timer import get_timer_manager +from .features.alarm import ASK_ALARM_TIME_PROMPT, get_alarm_clock +from .features.stopwatch import get_stopwatch_manager +from .features.timer import ASK_TIMER_TIME_PROMPT, get_timer_manager from .features.weather import get_weather_report from .features.music import get_music_controller from .features.cities_game import get_cities_game @@ -256,6 +257,7 @@ def main(): get_recognizer().initialize() # ΠŸΠΎΠ΄ΠΊΠ»ΡŽΡ‡Π΅Π½ΠΈΠ΅ ΠΊ Deepgram init_tts() # Π—Π°Π³Ρ€ΡƒΠ·ΠΊΠ° нСйросСти для синтСза Ρ€Π΅Ρ‡ΠΈ (Silero) alarm_clock = get_alarm_clock() # Π—Π°Π³Ρ€ΡƒΠ·ΠΊΠ° Π±ΡƒΠ΄ΠΈΠ»ΡŒΠ½ΠΈΠΊΠΎΠ² + stopwatch_manager = get_stopwatch_manager() # Π—Π°Π³Ρ€ΡƒΠ·ΠΊΠ° сСкундомСров timer_manager = get_timer_manager() # Π—Π°Π³Ρ€ΡƒΠ·ΠΊΠ° Ρ‚Π°ΠΉΠΌΠ΅Ρ€ΠΎΠ² cities_game = get_cities_game() # Π˜Π³Ρ€Π° "Π“ΠΎΡ€ΠΎΠ΄Π°" print() @@ -270,6 +272,10 @@ def main(): # (True = Ρ€Π΅ΠΆΠΈΠΌ Π΄ΠΈΠ°Π»ΠΎΠ³Π°, ΡΠ»ΡƒΡˆΠ°Π΅ΠΌ сразу. False = ΠΆΠ΄Π΅ΠΌ "Alexandr") skip_wakeword = False + # ΠšΠΎΠ½Ρ‚Π΅ΠΊΡΡ‚ уточнСния "Π½Π° ΠΊΠ°ΠΊΠΎΠ΅ врСмя ΠΏΠΎΡΡ‚Π°Π²ΠΈΡ‚ΡŒ ...". + # ΠœΠΎΠΆΠ΅Ρ‚ Π±Ρ‹Ρ‚ΡŒ: "timer", "alarm". + pending_time_target = None + # ΠŸΠ΅Ρ€Π΅ΠΌΠ΅Π½Π½Π°Ρ для отслСТивания послСднСй ΠΏΡ€ΠΎΠ²Π΅Ρ€ΠΊΠΈ Π·Π΄ΠΎΡ€ΠΎΠ²ΡŒΡ STT last_stt_check = time.time() @@ -314,9 +320,10 @@ def main(): if ding_sound: ding_sound.play() - # Π€Ρ€Π°Π·Π° ΡƒΡΠ»Ρ‹ΡˆΠ°Π½Π°! Π‘Π»ΡƒΡˆΠ°Π΅ΠΌ ΠΊΠΎΠΌΠ°Π½Π΄Ρƒ ΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Ρ‚Π΅Π»Ρ (5 сСкунд Ρ‚ΠΈΡˆΠΈΠ½Ρ‹ макс) + # Π€Ρ€Π°Π·Π° Π°ΠΊΡ‚ΠΈΠ²Π°Ρ†ΠΈΠΈ ΡƒΡΠ»Ρ‹ΡˆΠ°Π½Π°: + # Π΄ΠΎ 5с ΠΆΠ΄Ρ‘ΠΌ Π½Π°Ρ‡Π°Π»ΠΎ Ρ€Π΅Ρ‡ΠΈ, послС Π½Π°Ρ‡Π°Π»Π° Π·Π°Π²Π΅Ρ€ΡˆΠ°Π΅ΠΌ STT ΠΏΠΎ 3с Ρ‚ΠΈΡˆΠΈΠ½Ρ‹. try: - user_text = listen(timeout_seconds=5.0) + user_text = listen(timeout_seconds=5.0, fast_stop=True) except Exception as e: print(f"Ошибка ΠΏΡ€ΠΈ ΠΏΡ€ΠΎΡΠ»ΡƒΡˆΠΈΠ²Π°Π½ΠΈΠΈ: {e}") print("ΠŸΠ΅Ρ€Π΅ΠΈΠ½ΠΈΡ†ΠΈΠ°Π»ΠΈΠ·Π°Ρ†ΠΈΡ STT...") @@ -328,10 +335,12 @@ def main(): continue # ΠŸΡ€ΠΎΠ΄ΠΎΠ»ΠΆΠ°Π΅ΠΌ Ρ†ΠΈΠΊΠ» else: # Π Π΅ΠΆΠΈΠΌ Π΄ΠΈΠ°Π»ΠΎΠ³Π° (Follow-up): ΠΆΠ΄Π΅ΠΌ продолТСния Ρ€Π΅Ρ‡ΠΈ Π±Π΅Π· "Alexandr" - print("πŸ‘‚ Π‘Π»ΡƒΡˆΠ°ΡŽ ΠΏΡ€ΠΎΠ΄ΠΎΠ»ΠΆΠ΅Π½ΠΈΠ΅ Π΄ΠΈΠ°Π»ΠΎΠ³Π° (3 сСк)...") - # Π–Π΄Π΅ΠΌ Π½Π°Ρ‡Π°Π»Π° Ρ€Π΅Ρ‡ΠΈ 3 сСк. Если Π½Π°Ρ‡Π°Π»ΠΈ Π³ΠΎΠ²ΠΎΡ€ΠΈΡ‚ΡŒ, ΡΠ»ΡƒΡˆΠ°Π΅ΠΌ Π΄ΠΎ 7 сСк. + print("πŸ‘‚ Π‘Π»ΡƒΡˆΠ°ΡŽ ΠΏΡ€ΠΎΠ΄ΠΎΠ»ΠΆΠ΅Π½ΠΈΠ΅ Π΄ΠΈΠ°Π»ΠΎΠ³Π° (5 сСк)...") + # Π–Π΄Π΅ΠΌ Π½Π°Ρ‡Π°Π»Π° Ρ€Π΅Ρ‡ΠΈ 5 сСк. Если Π½Π°Ρ‡Π°Π»ΠΈ Π³ΠΎΠ²ΠΎΡ€ΠΈΡ‚ΡŒ, ΡΠ»ΡƒΡˆΠ°Π΅ΠΌ Π΄ΠΎ 7 сСк. try: - user_text = listen(timeout_seconds=7.0, detection_timeout=3.0) + user_text = listen( + timeout_seconds=7.0, detection_timeout=5.0, fast_stop=True + ) except Exception as e: print(f"Ошибка ΠΏΡ€ΠΈ ΠΏΡ€ΠΎΡΠ»ΡƒΡˆΠΈΠ²Π°Π½ΠΈΠΈ: {e}") print("ΠŸΠ΅Ρ€Π΅ΠΈΠ½ΠΈΡ†ΠΈΠ°Π»ΠΈΠ·Π°Ρ†ΠΈΡ STT...") @@ -350,13 +359,21 @@ def main(): # --- Π¨Π°Π³ 2: Анализ распознанного тСкста --- if not user_text: - # Π‘Ρ‹Π»Π° активация, Π½ΠΎ Ρ€Π΅Ρ‡ΡŒ Π½Π΅ распознана - speak("Π˜Π·Π²ΠΈΠ½ΠΈΡ‚Π΅, я вас Π½Π΅ Ρ€Π°ΡΡΠ»Ρ‹ΡˆΠ°Π». ΠŸΠΎΠΏΡ€ΠΎΠ±ΡƒΠΉΡ‚Π΅ Π΅Ρ‰Ρ‘ Ρ€Π°Π·.") - skip_wakeword = False # ВозвращаСмся Π² Ρ€Π΅ΠΆΠΈΠΌ оТидания ΠΈΠΌΠ΅Π½ΠΈ + # ΠŸΡƒΡΡ‚ΠΎΠΉ Π²Π²ΠΎΠ΄: Π±Π΅Π· Π»ΠΈΡˆΠ½ΠΈΡ… ΠΎΡ‚Π²Π΅Ρ‚ΠΎΠ² возвращаСмся ΠΊ оТиданию wake word. + skip_wakeword = False continue # ΠŸΡ€ΠΎΠ²Π΅Ρ€ΠΊΠ° Π½Π° ΠΊΠΎΠΌΠ°Π½Π΄Ρƒ "Π‘Ρ‚ΠΎΠΏ" if is_stop_command(user_text): + if stopwatch_manager.has_running_stopwatches(): + stopwatch_stop_response = stopwatch_manager.pause_stopwatches() + clean_stopwatch_stop_response = clean_response( + stopwatch_stop_response, language="ru" + ) + speak(clean_stopwatch_stop_response) + last_response = clean_stopwatch_stop_response + skip_wakeword = False + continue print("_" * 50) print("πŸ’€ Π–Π΄Ρƒ 'Alexandr' для Π°ΠΊΡ‚ΠΈΠ²Π°Ρ†ΠΈΠΈ...") skip_wakeword = False @@ -387,24 +404,52 @@ def main(): skip_wakeword = True continue + command_text = user_text + command_text_lower = command_text.lower() + if pending_time_target == "timer" and "Ρ‚Π°ΠΉΠΌΠ΅Ρ€" not in command_text_lower: + command_text = f"Ρ‚Π°ΠΉΠΌΠ΅Ρ€ {command_text}" + elif ( + pending_time_target == "alarm" + and "Π±ΡƒΠ΄ΠΈΠ»ΡŒΠ½ΠΈΠΊ" not in command_text_lower + and "Ρ€Π°Π·Π±ΡƒΠ΄ΠΈ" not in command_text_lower + ): + command_text = f"Π±ΡƒΠ΄ΠΈΠ»ΡŒΠ½ΠΈΠΊ {command_text}" + # ΠŸΡ€ΠΎΠ²Π΅Ρ€ΠΊΠ° ΠΊΠΎΠΌΠ°Π½Π΄ Ρ‚Π°ΠΉΠΌΠ΅Ρ€Π° ("ΠΏΠΎΡΡ‚Π°Π²ΡŒ Ρ‚Π°ΠΉΠΌΠ΅Ρ€ Π½Π° 6 ΠΌΠΈΠ½ΡƒΡ‚") - timer_response = timer_manager.parse_command(user_text) + stopwatch_response = stopwatch_manager.parse_command(command_text) + if stopwatch_response: + clean_stopwatch_response = clean_response( + stopwatch_response, language="ru" + ) + speak(clean_stopwatch_response) + last_response = clean_stopwatch_response + skip_wakeword = True + continue + + # ΠŸΡ€ΠΎΠ²Π΅Ρ€ΠΊΠ° ΠΊΠΎΠΌΠ°Π½Π΄ Ρ‚Π°ΠΉΠΌΠ΅Ρ€Π° ("ΠΏΠΎΡΡ‚Π°Π²ΡŒ Ρ‚Π°ΠΉΠΌΠ΅Ρ€ Π½Π° 6 ΠΌΠΈΠ½ΡƒΡ‚") + timer_response = timer_manager.parse_command(command_text) if timer_response: clean_timer_response = clean_response(timer_response, language="ru") completed = speak( clean_timer_response, check_interrupt=check_wakeword_once ) last_response = clean_timer_response + pending_time_target = ( + "timer" if timer_response == ASK_TIMER_TIME_PROMPT else None + ) skip_wakeword = not completed continue # ΠŸΡ€ΠΎΠ²Π΅Ρ€ΠΊΠ° ΠΊΠΎΠΌΠ°Π½Π΄ Π±ΡƒΠ΄ΠΈΠ»ΡŒΠ½ΠΈΠΊΠ° ("ΠΏΠΎΡΡ‚Π°Π²ΡŒ Π±ΡƒΠ΄ΠΈΠ»ΡŒΠ½ΠΈΠΊ Π½Π° 7") - alarm_response = alarm_clock.parse_command(user_text) + alarm_response = alarm_clock.parse_command(command_text) if alarm_response: clean_alarm_response = clean_response(alarm_response, language="ru") speak(clean_alarm_response) last_response = clean_alarm_response - skip_wakeword = False + pending_time_target = ( + "alarm" if alarm_response == ASK_ALARM_TIME_PROMPT else None + ) + skip_wakeword = alarm_response == ASK_ALARM_TIME_PROMPT continue # ΠŸΡ€ΠΎΠ²Π΅Ρ€ΠΊΠ° ΠΊΠΎΠΌΠ°Π½Π΄Ρ‹ громкости ("Π³Ρ€ΠΎΠΌΠΊΠΎΡΡ‚ΡŒ 5") diff --git a/data/stopwatches.json b/data/stopwatches.json new file mode 100644 index 0000000..f03273b --- /dev/null +++ b/data/stopwatches.json @@ -0,0 +1,9 @@ +[ + { + "id": 1, + "name": "", + "elapsed": 92.426419, + "running": false, + "started_at": null + } +] \ No newline at end of file diff --git a/scripts/qwen-check.sh b/scripts/qwen-check.sh new file mode 100755 index 0000000..7a7c762 --- /dev/null +++ b/scripts/qwen-check.sh @@ -0,0 +1,24 @@ +#!/usr/bin/env bash +set -euo pipefail + +ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)" +cd "$ROOT" + +echo "[qwen-check] Python syntax compile check" +python -m compileall app run.py >/dev/null + +echo "[qwen-check] Optional ruff check" +if command -v ruff >/dev/null 2>&1; then + ruff check app run.py +else + echo "[qwen-check] ruff not installed, skipping" +fi + +echo "[qwen-check] Optional pytest" +if command -v pytest >/dev/null 2>&1 && [ -d tests ]; then + pytest -q +else + echo "[qwen-check] tests/ or pytest not found, skipping" +fi + +echo "[qwen-check] Done" diff --git a/scripts/qwen-context.sh b/scripts/qwen-context.sh new file mode 100755 index 0000000..29566a5 --- /dev/null +++ b/scripts/qwen-context.sh @@ -0,0 +1,22 @@ +#!/usr/bin/env bash +set -euo pipefail + +ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)" +OUT="$ROOT/.qwen/project-context.txt" + +{ + echo "Project: alexander_smart-speaker" + echo "Generated: $(date -Iseconds)" + echo + echo "Top-level files:" + find "$ROOT" -maxdepth 2 -type f \ + ! -path '*/.git/*' \ + ! -path '*/venv/*' \ + ! -path '*/__pycache__/*' \ + | sed "s|$ROOT/||" | sort + echo + echo "Python modules:" + find "$ROOT/app" -type f -name '*.py' | sed "s|$ROOT/||" | sort +} > "$OUT" + +echo "[qwen-context] Wrote $OUT" diff --git a/ssp.py b/ssp.py new file mode 100644 index 0000000..84f35de --- /dev/null +++ b/ssp.py @@ -0,0 +1,10 @@ +maxi = 0 +for i in range(84052, 84131): + k = 0 + for j in range(1, i + 1): + if i % j == 0: + k += 1 + if maxi < k: + maxi = k + f = i +print(maxi, f)