Video Script to Generating Video with Voiceover

Question

Can anybody provide a step-by-step guide for a beginner user to make an app for Azure to work like Visla (https://app.visla.us/) that converts video text script to high-quality videos with Azure voiceover similar to what Visla offers?

mks_1973 · Answer

Creating an application similar to Visla, which converts text scripts into high-quality videos with Azure voiceovers, involves several steps. This guide will walk you through the process using Azure's AI services and Python programming.1. Set Up Your Azure EnvironmentCreate an Azure Account: If you don't have one, sign up at the Azure portal.Provision Necessary Services:Azure OpenAI Service: Provides access to language models for text processing.Azure Cognitive Services - Speech Service: Enables text-to-speech conversion.Please refer to Azure's documentation for detailed steps on creating these resources.2. Prepare Your Development EnvironmentInstall Python: Ensure Python is installed on your system.Set Up a Virtual Environment:python -m venv azure_video_envsource azure_video_env/bin/activate &nbsp;# On Windows: azure_video_env\Scripts\activateInstall Required Libraries:pip install openai azure-cognitiveservices-speech moviepy3. Summarize the Text ScriptUtilize Azure OpenAI to generate a concise summary of your script:import openaiopenai.api_type = "azure"openai.api_base = "https://&lt;Your_Resource_Name&gt;.openai.azure.com/"openai.api_version = "2022-12-01"openai.api_key = "&lt;Your_API_Key&gt;"def summarize_text(content, num_sentences=5):&nbsp; &nbsp; prompt = f'Provide a summary of the text below in {num_sentences} sentences:
{content}'&nbsp; &nbsp; response = openai.Completion.create(&nbsp; &nbsp; &nbsp; &nbsp; engine="text-davinci",&nbsp; &nbsp; &nbsp; &nbsp; prompt=prompt,&nbsp; &nbsp; &nbsp; &nbsp; temperature=0.3,&nbsp; &nbsp; &nbsp; &nbsp; max_tokens=250,&nbsp; &nbsp; &nbsp; &nbsp; top_p=1,&nbsp; &nbsp; &nbsp; &nbsp; frequency_penalty=0,&nbsp; &nbsp; &nbsp; &nbsp; presence_penalty=0&nbsp; &nbsp; )&nbsp; &nbsp; return response.choices[0].text.strip()# Example usagescript = "Your full text script here."summary = summarize_text(script)print(summary)4. Extract Key PhrasesUse Azure Cognitive Services to identify key phrases:from azure.ai.textanalytics import TextAnalyticsClientfrom azure.core.credentials import AzureKeyCredentialdef extract_key_phrases(text):&nbsp; &nbsp; credential = AzureKeyCredential("&lt;Your_Cognitive_Service_Key&gt;")&nbsp; &nbsp; endpoint = "https://&lt;Your_Cognitive_Service&gt;.cognitiveservices.azure.com/"&nbsp; &nbsp; client = TextAnalyticsClient(endpoint=endpoint, credential=credential)&nbsp; &nbsp; response = client.extract_key_phrases(documents=[text])&nbsp; &nbsp; return response[0].key_phrases# Example usagekey_phrases = extract_key_phrases(summary)print(key_phrases)5. Generate Images with DALL·ECreate prompts from key phrases to generate images using Azure's DALL·E API:import openaiopenai.api_key = "&lt;Your_DALL_E_API_Key&gt;"def generate_image(prompt, output_path):&nbsp; &nbsp; response = openai.Image.create(&nbsp; &nbsp; &nbsp; &nbsp; prompt=prompt,&nbsp; &nbsp; &nbsp; &nbsp; n=1,&nbsp; &nbsp; &nbsp; &nbsp; size="1024x1024"&nbsp; &nbsp; )&nbsp; &nbsp; image_url = response['data'][0]['url']&nbsp; &nbsp; # Download and save the image&nbsp; &nbsp; # (Implementation depends on your environment)&nbsp; &nbsp; return image_url# Example usagefor phrase in key_phrases:&nbsp; &nbsp; image_url = generate_image(phrase, f"images/{phrase}.png")&nbsp; &nbsp; print(f"Image for '{phrase}' saved at {image_url}")6. Convert Text to SpeechGenerate audio from the summarized text:import azure.cognitiveservices.speech as speechsdkdef text_to_speech(text, output_path):&nbsp; &nbsp; speech_config = speechsdk.SpeechConfig(subscription="&lt;Your_Speech_Key&gt;", region="&lt;Your_Speech_Region&gt;")&nbsp; &nbsp; audio_config = speechsdk.audio.AudioOutputConfig(filename=output_path)&nbsp; &nbsp; synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)&nbsp; &nbsp; result = synthesizer.speak_text_async(text).get()&nbsp; &nbsp; if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:&nbsp; &nbsp; &nbsp; &nbsp; print(f"Audio saved to {output_path}")&nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; print(f"Error: {result.error_details}")# Example usagetext_to_speech(summary, "audio/summary.wav")7. Compile the VideoCombine the generated images and audio into a video:from moviepy.editor import ImageClip, AudioFileClip, concatenate_videoclipsdef create_video(image_paths, audio_path, output_path):&nbsp; &nbsp; clips = []&nbsp; &nbsp; audio = AudioFileClip(audio_path)&nbsp; &nbsp; duration_per_image = audio.duration / len(image_paths)&nbsp; &nbsp; for image_path in image_paths:&nbsp; &nbsp; &nbsp; &nbsp; clip = ImageClip(image_path).set_duration(duration_per_image)&nbsp; &nbsp; &nbsp; &nbsp; clips.append(clip)&nbsp; &nbsp; video = concatenate_videoclips(clips, method="compose")&nbsp; &nbsp; video = video.set_audio(audio)&nbsp; &nbsp; video.write_videofile(output_path, fps=24)# Example usageimage_files = [f"images/{phrase}.png" for phrase in key_phrases]create_video(image_files, "audio/summary.wav", "final_video.mp4")8. Review and RefineEnsure the video and audio are synchronized and meet quality standards.Modify image durations, transitions, or re-generate assets as needed.

kidd_ip · Answer

Take a look at this:
&nbsp;
Getting started with Azure App Service - Azure App Service | Microsoft Learn
&nbsp;
Step-by-step guided tools needed to create an Azure App Service Using Power Shell - DEV Community

Forum Discussion

Video Script to Generating Video with Voiceover

2 Replies

Resources