Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Example request. LocalAudioTransport + Whisper + llm + tts #197

Closed
gaceladri opened this issue May 31, 2024 · 6 comments
Closed

Comments

@gaceladri
Copy link

gaceladri commented May 31, 2024

Hi 馃憢

I am having trouble running a local example that integrates LocalAudioTransport, WhisperSTTService, ElevenLabsTTSService, and OpenAILLMService.

I have successfully managed to run Whisper locally for transcription and another script that uses Eleven Labs and OpenAI for TTS and LLM services, respectively. However, I am struggling to combine these components to create a fully functional local conversation system.

To illustrate, here are the two examples I have working independently:

Example 1: Passing an LLM message to the TTS provider:

import asyncio
import os
import sys

import aiohttp
from loguru import logger
from pipecat.frames.frames import EndFrame, LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.local.audio import LocalAudioTransport

logger.remove(0)
logger.add(sys.stderr, level="DEBUG")


async def main():
    async with aiohttp.ClientSession() as session:
        transport = LocalAudioTransport(TransportParams(audio_out_enabled=True))

        tts = ElevenLabsTTSService(
            aiohttp_session=session,
            api_key=os.getenv("ELEVENLABS_API_KEY"),
            voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
        )

        llm = OpenAILLMService(
            api_key=os.getenv("OPENAI_API_KEY"),
            model="gpt-3.5-turbo-0125",
        )

        messages = [
            {
                "role": "system",
                "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
            },
        ]

        pipeline = Pipeline([llm, tts, transport.output()])

        task = PipelineTask(pipeline)

        async def say_something():
            await asyncio.sleep(1)
            await task.queue_frames([LLMMessagesFrame(messages), EndFrame()])

        runner = PipelineRunner()

        await asyncio.gather(runner.run(task), say_something())


if __name__ == "__main__":
    asyncio.run(main())

Example 2: Using Whisper locally:

import asyncio
import sys

from loguru import logger
from pipecat.frames.frames import Frame, TranscriptionFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.whisper import Model, WhisperSTTService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.local.audio import LocalAudioTransport

logger.remove(0)
logger.add(sys.stderr, level="DEBUG")


class TranscriptionLogger(FrameProcessor):
    async def process_frame(self, frame: Frame, direction: FrameDirection):
        if isinstance(frame, TranscriptionFrame):
            print(f"Transcription: {frame.text}")


async def main():
    transport = LocalAudioTransport(TransportParams(audio_in_enabled=True))

    stt = WhisperSTTService()

    tl = TranscriptionLogger()

    pipeline = Pipeline([transport.input(), stt, tl])

    task = PipelineTask(pipeline)

    runner = PipelineRunner()

    await runner.run(task)


if __name__ == "__main__":
    asyncio.run(main())

Despite these individual successes, I'm unable to connect the transcriptions with the LLM and have a continuous conversation. Could you provide or add an example of a fully working local setup that demonstrates how to achieve this?

Thank you!

@ajram23
Copy link

ajram23 commented Jun 1, 2024

I got it running, DM me and I will give you the script.

@gaceladri
Copy link
Author

gaceladri commented Jun 1, 2024

@ajram23 Can you paste it here? 馃檹 I didn't know that I can dm someone in GitHub!

@ajram23
Copy link

ajram23 commented Jun 1, 2024

07-interruptible-local.py.txt
Here you go! Enjoy! cc @aconchillo just in case you want to add this to the examples folder.

@gaceladri
Copy link
Author

@ajram23 Thank you for your example! It's quite similar to what I have implemented. Were you able to interact with the LLM? In my case, I can see the initial message from the LLM, but I seem to have an issue with the communication between the Whisper service and the LLMUserResponseAggregator.

Here is my current code:

#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#

import asyncio
import os
import sys

import aiohttp
from loguru import logger
from pipecat.frames.frames import Frame, LLMMessagesFrame, TranscriptionFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
    LLMAssistantResponseAggregator,
    LLMUserResponseAggregator,
)
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.services.whisper import WhisperSTTService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.local.audio import LocalAudioTransport
from pipecat.vad.silero import SileroVADAnalyzer
from pipecat.vad.vad_analyzer import VADParams

logger.remove(0)
logger.add(sys.stderr, level="DEBUG")


class TranscriptionLogger(FrameProcessor):
    async def process_frame(self, frame: Frame, direction: FrameDirection):
        if isinstance(frame, TranscriptionFrame):
            logger.debug(f"Whisper transcription: {frame.text}")


async def main():
    async with aiohttp.ClientSession() as session:
        transport = LocalAudioTransport(
            TransportParams(
                audio_in_enabled=True,
                audio_out_enabled=True,
                transcription_enabled=True,
                vad_enabled=True,
                vad_analyzer=SileroVADAnalyzer(params=VADParams(min_volume=0.6)),
                vad_audio_passthrough=True,
            )
        )

        stt = WhisperSTTService(no_speech_prob=0.6)

        tts = ElevenLabsTTSService(
            aiohttp_session=session,
            api_key=os.getenv("ELEVENLABS_API_KEY"),
            voice_id="2ovNLFOsfyKPEWV5kqQi",
        )

        llm = OpenAILLMService(
            api_key=os.getenv("OPENAI_API_KEY"),
            model="gpt-3.5-turbo-0125",
        )

        messages = [
            {
                "role": "system",
                "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
            },
        ]

        tma_in = LLMUserResponseAggregator(messages)
        tma_out = LLMAssistantResponseAggregator(messages)

        pipeline = Pipeline(
            [
                transport.input(),  # Transport user input
                stt,  # STT
                tma_in,  # User responses
                llm,  # LLM
                tts,  # TTS
                transport.output(),  # Transport bot output
                tma_out,  # Assistant spoken responses
            ]
        )

        task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))

        runner = PipelineRunner()

        async def say_something():
            messages.append(
                {"role": "system", "content": "Please introduce yourself to the user."}
            )
            await task.queue_frames([LLMMessagesFrame(messages)])

        await asyncio.gather(runner.run(task), say_something())


if __name__ == "__main__":
    asyncio.run(main())

Here are my pipeline debug messages:

2024-06-02 09:25:37.876 | DEBUG    | pipecat.services.whisper:_load:67 - Loaded Whisper model
2024-06-02 09:25:37.939 | DEBUG    | pipecat.processors.frame_processor:link:37 - Linking PipelineSource#0 -> LocalAudioInputTransport#0
2024-06-02 09:25:37.939 | DEBUG    | pipecat.processors.frame_processor:link:37 - Linking LocalAudioInputTransport#0 -> WhisperSTTService#0
2024-06-02 09:25:37.939 | DEBUG    | pipecat.processors.frame_processor:link:37 - Linking WhisperSTTService#0 -> LLMUserResponseAggregator#0
2024-06-02 09:25:37.939 | DEBUG    | pipecat.processors.frame_processor:link:37 - Linking LLMUserResponseAggregator#0 -> OpenAILLMService#0
2024-06-02 09:25:37.939 | DEBUG    | pipecat.processors.frame_processor:link:37 - Linking OpenAILLMService#0 -> ElevenLabsTTSService#0
2024-06-02 09:25:37.939 | DEBUG    | pipecat.processors.frame_processor:link:37 - Linking ElevenLabsTTSService#0 -> LocalAudioOutputTransport#0
2024-06-02 09:25:37.939 | DEBUG    | pipecat.processors.frame_processor:link:37 - Linking LocalAudioOutputTransport#0 -> LLMAssistantResponseAggregator#0
2024-06-02 09:25:37.939 | DEBUG    | pipecat.processors.frame_processor:link:37 - Linking LLMAssistantResponseAggregator#0 -> PipelineSink#0
2024-06-02 09:25:37.939 | DEBUG    | pipecat.processors.frame_processor:link:37 - Linking Source#0 -> Pipeline#0
2024-06-02 09:25:37.939 | DEBUG    | pipecat.pipeline.runner:run:29 - Runner PipelineRunner#0 started running PipelineTask#0
2024-06-02 09:25:37.940 | DEBUG    | pipecat.services.openai:_stream_chat_completions:69 - Generating chat: [{"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.", "role": "system", "name": "system"}, {"content": "Please introduce yourself to the user.", "role": "system", "name": "system"}]
2024-06-02 09:25:38.694 | DEBUG    | pipecat.services.openai:_stream_chat_completions:96 - OpenAI LLM TTFB: 0.7542369365692139
2024-06-02 09:25:38.719 | DEBUG    | pipecat.services.elevenlabs:run_tts:35 - Transcribing text: [Hello!]
2024-06-02 09:25:39.323 | DEBUG    | pipecat.services.elevenlabs:run_tts:35 - Transcribing text: [I am your helpful Legal Language Model here to assist you during this WebRTC call.]
2024-06-02 09:25:40.177 | DEBUG    | pipecat.services.elevenlabs:run_tts:35 - Transcribing text: [How can I help you today?]
2024-06-02 09:25:42.710 | DEBUG    | pipecat.services.whisper:run_stt:86 - Whisper transcription:  Okay. 
2024-06-02 09:25:44.907 | DEBUG    | pipecat.services.whisper:run_stt:86 - Whisper transcription:  Okay. 
2024-06-02 09:25:47.174 | DEBUG    | pipecat.services.whisper:run_stt:86 - Whisper transcription:  you're 
2024-06-02 09:25:48.758 | DEBUG    | pipecat.services.whisper:run_stt:86 - Whisper transcription:  Okay. 
2024-06-02 09:25:51.009 | DEBUG    | pipecat.services.whisper:run_stt:86 - Whisper transcription:  I'm 
2024-06-02 09:25:53.277 | DEBUG    | pipecat.services.whisper:run_stt:86 - Whisper transcription:  Okay. 
2024-06-02 09:25:54.880 | DEBUG    | pipecat.services.whisper:run_stt:86 - Whisper transcription:  I did. 
2024-06-02 09:26:13.526 | DEBUG    | pipecat.services.whisper:run_stt:86 - Whisper transcription:  Hey.

For some reason, the transcriptions from Whisper are not being passed to the LLMUserResponseAggregator. I've added print statements inside the LLMUserResponseAggregator to check the messages, but nothing is logged after the Whisper model transcribes the speech.

I'm running this on a Mac M2.

Any insights or suggestions on what might be going wrong would be greatly appreciated!

Thank you for your help!

@ajram23
Copy link

ajram23 commented Jun 2, 2024

@gaceladri in my case I was able to, not sure what is going with yours.

@gaceladri
Copy link
Author

@gaceladri in my case I was able to, not sure what is going with yours.

Ok, thank you for the support and feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants