Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🎤 feat: add custom speech config, browser TTS/STT features, and dynamic speech tab settings #2921

Open
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

berry-13
Copy link
Collaborator

@berry-13 berry-13 commented May 30, 2024

Summary

This PR introduces several key features and improvements related to the speech functionality in LibreChat:

  • Custom Speech Configuration: Added a custom speech configuration option in librechat.yaml, allowing the ADMIN to set pre-configured "speech tab" settings
  • Browser TTS Language Selection: Enabled language selection for text-to-speech (TTS) directly within the browser
  • Browser STT Streaming: Implemented streaming for speech-to-text (STT) in the browser
  • Dynamic Speech Tab Settings: The speech tab settings now dynamically appear and disappear based on user settings (e.g., browser/external engine dropdown)
  • Refactoring:
    • Renamed endpointSTT and endpointTTS to engineSTT and engineTTS respectively
    • Moved the speech API to a subpath /api/files/speech

Breaking Changes

  • The variables SpeechToText and TextToSpeech in the store have been renamed to speechToText and textToSpeech. If you encounter any issues, please delete LibreChat's cache
  • The speech section has been moved under the speech:

Change Type

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

Testing

External STT/TTS:

  • ElevenLabs
  • OpenAI

Local STT/TTS:

  • Chrome

Test Configuration:

To reproduce the test process, follow these steps:

  1. Configure the librechat.yaml file with appropriate speech settings
  2. Select a TTS language in the browser and verify the speech output
  3. Test STT streaming functionality in the browser using various input sources
  4. Verify that the speech tab settings appear or disappear based on the selected user settings
  5. Ensure all refactored endpoints (engineSTT and engineTTS) function correctly
  6. Confirm the speech API is accessible via the new subpath /api/files/speech

Checklist

  • My code adheres to this project's style guidelines.
  • I have performed a self-review of my own code.
  • I have commented in any complex areas of my code.
  • I have made pertinent documentation changes.
  • My changes do not introduce new warnings.
  • I have written tests demonstrating that my changes are effective or that my feature works.
  • Local unit tests pass with my changes.
  • Any changes dependent on mine have been merged and published in downstream modules.
  • A pull request for updating the documentation has been submitted.

api/server/routes/files/index.js Outdated Show resolved Hide resolved
client/src/hooks/Input/useSpeechToTextBrowser.ts Outdated Show resolved Hide resolved
client/src/hooks/Input/useSpeechToTextBrowser.ts Outdated Show resolved Hide resolved
client/src/hooks/Input/useSpeechToTextBrowser.ts Outdated Show resolved Hide resolved
…ernal audio endpoints

This commit updates the useTextToSpeech and useSpeechToText hooks in the Input directory to support external audio endpoints. It introduces the useGetExternalTextToSpeech and useGetExternalSpeechToText hooks, which determine whether the audio endpoints should be set to 'browser' or 'external' based on the value of the endpointTTS and endpointSTT Recoil states. The useTextToSpeech and useSpeechToText hooks now use these new hooks to determine whether to use external audio endpoints
The updateTokenWebsocket function and its import are no longer used in the OpenAIClient module. This commit removes the function and import to clean up the codebase
…chToText hooks

This commit updates the useTextToSpeech and useSpeechToText hooks in the Input directory to support external audio endpoints. It introduces the useGetExternalTextToSpeech and useGetExternalSpeechToText hooks, which determine whether the audio endpoints should be set to 'browser' or 'external' based on the value of the endpointTTS and endpointSTT Recoil states. The useTextToSpeech and useSpeechToText hooks now use these new hooks to determine whether to use external audio endpoints
…tests: added AutomaticPlaybackSwitch.spec

>
> This commit renames the AutomaticPlayback component to AutomaticPlaybackSwitch in the Speech directory. The new name better reflects the purpose of the component and aligns with the naming convention used in the codebase.
This commit updates the useSpeechToText hook in the client/src/components/Chat/Input/AudioRecorder.tsx file to include the interimTranscript state. This allows for real-time display of the speech-to-text transcription while the user is still speaking. The interimTranscript is now used to update the text area value during recording.
…h configuration

This commit adds a new API endpoint  in the  file under the  directory. This endpoint is responsible for retrieving the custom speech configuration using the  function from the  module
…speech configurations

This commit modifies the useCustomConfigSpeechQuery function in the client/src/data-provider/queries.ts file to return an array of custom speech configurations instead of a single object. This change allows for better handling and manipulation of the data in the application
@berry-13 berry-13 marked this pull request as ready for review June 1, 2024 12:18
@danny-avila
Copy link
Owner

please fix the broken test

@danny-avila
Copy link
Owner

slated for v0.7.4

@danny-avila
Copy link
Owner

danny-avila commented Jun 21, 2024

Hi Berry, there are elevated errors when using this without config settings

2024-06-21 09:31:55 info: Server listening on all interfaces at port 3080. Use http://localhost:3080 to access it
2024-06-21 09:31:59 error: Failed to get speechTab settings: Configuration or speechTab schema is missing
2024-06-21 09:31:59 error: Failed to get voices: Configuration or TTS schema is missing
2024-06-21 09:32:45 error: Failed to get speechTab settings: Configuration or speechTab schema is missing
2024-06-21 09:32:45 error: Failed to get voices: Configuration or TTS schema is missing

I should not be seeing any errors when I don't have speech enabled/configured and I open the Settings

@danny-avila
Copy link
Owner

Also when i add the new format for speech settings, i should not get an error if i don't have speechTab settings

2024-06-21 09:36:42 error: Failed to get speechTab settings: Configuration or speechTab schema is missing

@danny-avila
Copy link
Owner

lastly, if I'm using the old setup, this error message is confusing and I will think the app has a bug:

2024-06-21 09:40:49 error: Invalid custom config file at /home/danny/LibreChat/librechat.yaml [
  {
    "code": "unrecognized_keys",
    "keys": [
      "tts",
      "stt"
    ],
    "path": [],
    "message": "Unrecognized key(s) in object: 'tts', 'stt'"
  }
]

this is a parsing error for the custom config file, we should still show it, but also add a note if this error message is detected, that the format has changed.

Also please accompany this PR with an update to the changelog: https://www.librechat.ai/changelog

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants