🎤 feat: add custom speech config, browser TTS/STT features, and dynamic speech tab settings #2921

berry-13 · 2024-05-30T15:55:27Z

Summary

This PR introduces several key features and improvements related to the speech functionality in LibreChat:

Custom Speech Configuration: Added a custom speech configuration option in librechat.yaml, allowing the ADMIN to set pre-configured "speech tab" settings
Browser TTS Language Selection: Enabled language selection for text-to-speech (TTS) directly within the browser
Browser STT Streaming: Implemented streaming for speech-to-text (STT) in the browser
Dynamic Speech Tab Settings: The speech tab settings now dynamically appear and disappear based on user settings (e.g., browser/external engine dropdown)
Refactoring:
- Renamed endpointSTT and endpointTTS to engineSTT and engineTTS respectively
- Moved the speech API to a subpath /api/files/speech

Breaking Changes

The variables SpeechToText and TextToSpeech in the store have been renamed to speechToText and textToSpeech. If you encounter any issues, please delete LibreChat's cache
The speech section has been moved under the speech:

Change Type

New feature (non-breaking change which adds functionality)
This change requires a documentation update

Testing

External STT/TTS:

ElevenLabs
OpenAI

Local STT/TTS:

Chrome

Test Configuration:

To reproduce the test process, follow these steps:

Configure the librechat.yaml file with appropriate speech settings
Select a TTS language in the browser and verify the speech output
Test STT streaming functionality in the browser using various input sources
Verify that the speech tab settings appear or disappear based on the selected user settings
Ensure all refactored endpoints (engineSTT and engineTTS) function correctly
Confirm the speech API is accessible via the new subpath /api/files/speech

Checklist

My code adheres to this project's style guidelines.
I have performed a self-review of my own code.
I have commented in any complex areas of my code.
I have made pertinent documentation changes.
My changes do not introduce new warnings.
I have written tests demonstrating that my changes are effective or that my feature works.
Local unit tests pass with my changes.
Any changes dependent on mine have been merged and published in downstream modules.
A pull request for updating the documentation has been submitted.

api/server/routes/files/index.js

client/src/hooks/Input/useSpeechToTextBrowser.ts

…ernal audio endpoints This commit updates the useTextToSpeech and useSpeechToText hooks in the Input directory to support external audio endpoints. It introduces the useGetExternalTextToSpeech and useGetExternalSpeechToText hooks, which determine whether the audio endpoints should be set to 'browser' or 'external' based on the value of the endpointTTS and endpointSTT Recoil states. The useTextToSpeech and useSpeechToText hooks now use these new hooks to determine whether to use external audio endpoints

The updateTokenWebsocket function and its import are no longer used in the OpenAIClient module. This commit removes the function and import to clean up the codebase

…chToText hooks This commit updates the useTextToSpeech and useSpeechToText hooks in the Input directory to support external audio endpoints. It introduces the useGetExternalTextToSpeech and useGetExternalSpeechToText hooks, which determine whether the audio endpoints should be set to 'browser' or 'external' based on the value of the endpointTTS and endpointSTT Recoil states. The useTextToSpeech and useSpeechToText hooks now use these new hooks to determine whether to use external audio endpoints

…tests: added AutomaticPlaybackSwitch.spec > > This commit renames the AutomaticPlayback component to AutomaticPlaybackSwitch in the Speech directory. The new name better reflects the purpose of the component and aligns with the naming convention used in the codebase.

This commit updates the useSpeechToText hook in the client/src/components/Chat/Input/AudioRecorder.tsx file to include the interimTranscript state. This allows for real-time display of the speech-to-text transcription while the user is still speaking. The interimTranscript is now used to update the text area value during recording.

…h configuration This commit adds a new API endpoint in the file under the directory. This endpoint is responsible for retrieving the custom speech configuration using the function from the module

…speech configurations This commit modifies the useCustomConfigSpeechQuery function in the client/src/data-provider/queries.ts file to return an array of custom speech configurations instead of a single object. This change allows for better handling and manipulation of the data in the application

…speech configurations

…ings

danny-avila · 2024-06-12T13:51:47Z

please fix the broken test

…echat into speech-refactor

danny-avila · 2024-06-15T13:36:39Z

slated for v0.7.4

danny-avila · 2024-06-21T13:35:02Z

Hi Berry, there are elevated errors when using this without config settings

2024-06-21 09:31:55 info: Server listening on all interfaces at port 3080. Use http://localhost:3080 to access it
2024-06-21 09:31:59 error: Failed to get speechTab settings: Configuration or speechTab schema is missing
2024-06-21 09:31:59 error: Failed to get voices: Configuration or TTS schema is missing
2024-06-21 09:32:45 error: Failed to get speechTab settings: Configuration or speechTab schema is missing
2024-06-21 09:32:45 error: Failed to get voices: Configuration or TTS schema is missing

I should not be seeing any errors when I don't have speech enabled/configured and I open the Settings

danny-avila · 2024-06-21T13:38:08Z

Also when i add the new format for speech settings, i should not get an error if i don't have speechTab settings

2024-06-21 09:36:42 error: Failed to get speechTab settings: Configuration or speechTab schema is missing

danny-avila · 2024-06-21T13:42:18Z

lastly, if I'm using the old setup, this error message is confusing and I will think the app has a bug:

2024-06-21 09:40:49 error: Invalid custom config file at /home/danny/LibreChat/librechat.yaml [
  {
    "code": "unrecognized_keys",
    "keys": [
      "tts",
      "stt"
    ],
    "path": [],
    "message": "Unrecognized key(s) in object: 'tts', 'stt'"
  }
]

this is a parsing error for the custom config file, we should still show it, but also add a note if this error message is detected, that the format has changed.

Also please accompany this PR with an update to the changelog: https://www.librechat.ai/changelog

danny-avila requested changes May 30, 2024

View reviewed changes

berry-13 added 13 commits May 31, 2024 22:40

feat: add userSelect style to ConversationModeSwitch label

4632794

fix: remove unused updateTokenWebsocket function and import

92bd816

The updateTokenWebsocket function and its import are no longer used in the OpenAIClient module. This commit removes the function and import to clean up the codebase

feat: Add customConfigSpeech API endpoint for retrieving custom speec…

0341052

…h configuration This commit adds a new API endpoint in the file under the directory. This endpoint is responsible for retrieving the custom speech configuration using the function from the module

feat: update store var and ; fix: getCustomConfigSpeech

80c60a0

fix: client tests, removed unused import

177a78a

feat: Update useCustomConfigSpeechQuery to return an array of custom …

143afae

…speech configurations

refactor: Update variable name in speechTab schema

1ea43fe

refactor: removed unused and nested code

001441d

berry-13 force-pushed the speech-refactor branch from 1f45063 to 001441d Compare May 31, 2024 21:20

berry-13 added 2 commits June 1, 2024 01:49

fix: using recoilState

3f12bb0

refactor: Update Speech component to use useCallback for setting sett…

77ec42d

…ings

berry-13 marked this pull request as ready for review June 1, 2024 12:18

Merge branch 'main' into speech-refactor

85bfacb

berry-13 added 3 commits June 12, 2024 15:53

Merge branch 'speech-refactor' of https://github.com/danny-avila/libr…

ef7b7eb

…echat into speech-refactor

fix: test

e5c838f

fix: tests

6d0412c

berry-13 and others added 3 commits June 17, 2024 13:33

Merge branch 'main' into speech-refactor

05e66c2

Merge branch 'main' into speech-refactor

67df8fd

Merge branch 'main' into speech-refactor

9ff77fb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🎤 feat: add custom speech config, browser TTS/STT features, and dynamic speech tab settings #2921

🎤 feat: add custom speech config, browser TTS/STT features, and dynamic speech tab settings #2921

berry-13 commented May 30, 2024 •

edited

Loading

danny-avila commented Jun 12, 2024

danny-avila commented Jun 15, 2024

danny-avila commented Jun 21, 2024 •

edited

Loading

danny-avila commented Jun 21, 2024

danny-avila commented Jun 21, 2024

🎤 feat: add custom speech config, browser TTS/STT features, and dynamic speech tab settings #2921

Are you sure you want to change the base?

🎤 feat: add custom speech config, browser TTS/STT features, and dynamic speech tab settings #2921

Conversation

berry-13 commented May 30, 2024 • edited Loading

Summary

Breaking Changes

Change Type

Testing

External STT/TTS:

Local STT/TTS:

Test Configuration:

Checklist

danny-avila commented Jun 12, 2024

danny-avila commented Jun 15, 2024

danny-avila commented Jun 21, 2024 • edited Loading

danny-avila commented Jun 21, 2024

danny-avila commented Jun 21, 2024

berry-13 commented May 30, 2024 •

edited

Loading

danny-avila commented Jun 21, 2024 •

edited

Loading