artificial intelligence

48 Topics

Streamlining data discovery for AI/ML with OpenMetadata on AKS and Azure NetApp Files
This article contains a step-by-step guide to deploying OpenMetadata on Azure Kubernetes Service (AKS), using Azure NetApp Files for storage. It also covers the deployment and configuration of PostgreSQL and OpenSearch databases to run externally from the Kubernetes cluster, following OpenMetadata best practices, managed by NetApp® Instaclustr®. This comprehensive tutorial aims to assist Microsoft and NetApp customers in overcoming the challenges of identifying and managing their data for AI/ML purposes. By following this guide, users will achieve a fully functional OpenMetadata instance, enabling efficient data discovery, enhanced collaboration, and robust data governance.
GeertVanTeylingen
May 01, 2025 Place Azure Architecture Blog
260Views
0likes
0Comments
AI for Operations
Solutions idea This solution series shows some examples of how Azure OpenAI and its LLM models can be used on Operations and FinOps issues. With a view to the use of models linked to the Enterprise Scale Landing Zone, the solutions shown, which are available on a dedicated GitHub, are designed to be deployed within a dedicated subscription, in the examples called ‘OpenAI-CoreIntegration’. The examples we are going to list are: SQL BPA AI Enhanced Azure Update Manager AI Enhanced Azure Cost Management AI Enhanced Azure AI Anomalies Detection Azure OpenAI Smart Doc Creator Enterprise Scale AI for Operations Landing Zone Design Architecture SQL BPA AI Enhanced Architecture This LogApp is an example of integrating ARC SQL practices assessment results with OpenAI, creating an HTML report and CSV file send via Email with OpenAI comment of Severity High and/or Medium results based on the actual Microsoft Documentation. Dataflow Initial Trigger Type: Recurrence Configuration: Frequency: Weekly Day: Monday Time: 9:00 AM Time Zone: W. Europe Standard Time Description: The Logic App is triggered weekly to gather data for SQL Best Practice Assessments. Step 1: Data Query Action: Run_query_and_list_results Description: Executes a Log Analytics query to retrieve SQL assessment results from monitored resources. Output: A dataset containing issues classified by severity (High/Medium). Step 2: Variable Initialization Actions: Initialize_variable_CSV: Initializes an empty array to store CSV results. Open_AI_API_Key: Sets up the API key for Azure OpenAI service. HelpLinkContent: Prepares a variable to store useful links. Description: Configures necessary variables for subsequent steps. Step 3: Process Results Action: For_eachSQLResult Description: Processes the query results with the following sub-steps: Condition: Checks if the severity is High or Medium. OpenAI Processing: Sends structured prompts to the GPT-4 model for recommendations on identified issues. Parses the JSON response to extract specific insights. CSV Composition: Creates an array containing detailed results. Step 4: Report Generation Actions: Create_CSV_table: Converts processed data into a CSV format. Create_HTML_table: Generates an HTML table from the data. ComposeMailMessage: Prepares an HTML email message containing the results and a link to the report. Description: Formats the data for sharing. Step 5: Saving and Sharing Actions: Create_file: Saves the HTML report to OneDrive. Send_an_email_(V2): Sends an email with the reports attached (HTML and CSV). Post_message_in_a_chat_or_channel: Shares the results in a Teams channel. Description: Distributes the reports to defined recipients. Components Azure OpenAI service is a platform provided by Microsoft that offers access to powerful language models developed by OpenAI, including GPT-4, GPT-4o, GPT-4o mini, and others. The service is used in this scenario for all the natural language understanding and generating communication to the customers. Azure Logic Apps is a cloud platform where you can create and run automated workflows with little to no code. Azure Logic Apps Managed Identities allow to authenticate to any resource that supports Microsoft Entra authentication, including your own applications. Azure ARC SQL Server enabled by Azure Arc extends Azure services to SQL Server instances hosted outside of Azure: in your data center, in edge site locations like retail stores, or any public cloud or hosting provider. SQL Best Practices Assessment feature provides a mechanism to evaluate the configuration of your SQL Server instance. Azure Monitor is a comprehensive monitoring solution for collecting, analyzing, and responding to monitoring data from your cloud and on-premises environments. Azure Kusto Query is a powerful tool to explore your data and discover patterns, identify anomalies and outliers, create statistical modeling, and more Potential use cases SQL BPA AI Enhanced exploits the capabilities of the SQL Best Practice Assessment service based on Azure ARC SQL Server. The collected data can be used for the generation of customised tables. The solution is designed for customers who want to enrich their Assessment information with Generative Artificial Intelligence. Azure Update Manager AI Enhanced Architecture This LogApp solution example retrieves data from the Azure Update Manager service and returns an output processed by generative artificial intelligence. Dataflow Initial Trigger Type: Recurrence Trigger Frequency: Monthly Time Zone: W. Europe Standard Time Triggers the Logic App at the beginning of every month. Step 1: Initialize API Key Action: Initialize Variable Variable Name: Api-Key Step 2: Fetch Update Status Action: HTTP Request URI: https://management.azure.com/providers/Microsoft.ResourceGraph/resources Query: Retrieves resources related to patch assessments using patchassessmentresources. Step 3: Parse Update Status Action: Parse JSON Content: Response body from the HTTP request. Schema: Extracts details such as VM Name, Patch Name, Patch Properties, etc. Step 4: Process Updates For Each: Body('Parse_JSON')?['data'] Iterates through each item in the parsed update data. Condition: If Patch Name is not null and contains "KB": Action: Format Item Parses individual update items for VM Name, Patch Name, and additional properties. Action: Send to Azure OpenAI Description: Sends structured prompts to the GPT-4 model Headers: Content-Type: application/json api-key: @variables('Api-Key') Body: Prompts Azure OpenAI to generate a report for each virtual machine and patch, formatted in Italian. Action: Parse OpenAI Response Extracts and formats the response generated by Azure OpenAI. Action: Append to Summary and CSV Adds the OpenAI-generated response to the Updated Summary array. Appends patch details to the CSV array. Step 5: Finalize Report Action: Create Reports (I, II, III) Formats and cleans the Updated Summary variable to remove unwanted characters. Action: Compose HTML Email Content Constructs an HTML email with the following: Report summary generated using OpenAI. Disclaimer about possible formatting anomalies. Company logo embedded. Step 6: Generate CSV Table Action: Converts the CSV array into a CSV format for attachment. Step 7: Send E-Mail Action: Send Email Recipient: [email protected] Subject: Security Update Assessment Body: HTML content with report summary. Attachment: Name: SmartUpdate_<timestamp>.csv Content: CSV table of update details. Components Azure OpenAI service is a platform provided by Microsoft that offers access to powerful language models developed by OpenAI, including GPT-4, GPT-4o, GPT-4o mini, and others. The service is used in this scenario for all the natural language understanding and generating communication to the customers. Azure Logic Apps is a cloud platform where you can create and run automated workflows with little to no code. Azure Logic Apps Managed Identities allow to authenticate to any resource that supports Microsoft Entra authentication, including your own applications. Azure Update Manager is a unified service to help manage and govern updates for all your machines. You can monitor Windows and Linux update compliance across your machines in Azure and on-premises/on other cloud platforms (connected by Azure Arc) from a single pane of management. You can also use Update Manager to make real-time updates or schedule them within a defined maintenance window. Azure Arc Server lets you manage Windows and Linux physical servers and virtual machines hosted outside of Azure, on your corporate network, or other cloud provider. Potential use cases Azure Update Manager AI Enhanced is an example of a solution designed for all those situations where the IT department needs to manage and automate the telling of information in a readable format on the status of updates to its infrastructure thanks to an output managed by generative artificial intelligence Azure Cost Management AI Enhanced Architecture This LogApp solution retrieves consumption data from the Azure environment and generates a general and detailed cost trend report on a scheduled basis. Dataflow Initial Trigger Type: Manual HTTP Trigger The Logic App is triggered manually using an HTTP request. Step 1: Set Current Date and Old Date Action: Set Actual Date Current date is initialized to @utcNow('yyyy-MM-dd'). Example Value: 2024-11-22. Action: Set Actual Date -30 Old date is set to 30 days before the current date. Example Value: 2024-10-23. Action: Set old date -30 Sets the variable currentdate to 30 days prior to the old date. Example Value: 2024-09-23. Action: Set old date -60 Sets the variable olddate to 60 days before the current date. Example Value: 2024-08-23. Step 2: Query Cost Data Action: Query last 30 days Queries Azure Cost Management for the last 30 days. Example Data Returned:json{ "properties": { "rows": [ ["Virtual Machines", 5000], ["Databases", 7000], ["Storage", 3000] ] } } Copia codice Action: Query -60 -30 days Queries Azure Cost Management for 30 to 60 days ago. Example Data Returned:json{ "properties": { "rows": [ ["Virtual Machines", 4800], ["Databases", 6800], ["Storage", 3050] ] } } Copia codice Step 3: Download Detailed Reports Action: Download_report_actual_month Generates and retrieves a detailed cost report for the current month. Action: Download_report_last_month Generates and retrieves a detailed cost report for the previous month. Step 4: Process and Store Reports Action: Actual_Month_Report Parses the JSON from the current month's report. Retrieves blob download links for the detailed report. Action: Last_Month_Report Parses the JSON from the last month's report. Retrieves blob download links for the detailed report. Action: Create_ActualMonthDownload and Create_LastMonthDownload Initializes variables to store download links. Action: Get_Actual_Month_Download_Link and Get_Last_Month_Download_Link Iterates through blob data and assigns the download link variables. Step 5: Generate Questions for OpenAI Action: Set_Question Prepares the first question for Azure OpenAI: "Describe the key differences between the previous and current month's costs, and create a bullet-point list detailing these differences in Euros." Action: Set_Second_Question Prepares a second question for Azure OpenAI: "Briefly describe in Italian the major cost differences between the two months, rounding the amounts to Euros." Step 6: Send Questions to Azure OpenAI Action: Passo result to OpenAI Sends the first question to OpenAI for generating detailed insights. Action: Get Description from OpenAI Sends the second question to OpenAI for a brief summary in Italian. Step 8: Process OpenAI Responses Action: Parse_JSON and Parse_JSON_Second_Question Parses the JSON response from OpenAI for both questions. Retrieves the content of the generated insights. Action: For_each_Description Iterates through OpenAI's responses and assigns the description to a variable DescriptionOutput. Step 9: Compose and send E-Mail Action: Compose_Email Composes an HTML email including: Key insights from OpenAI. Links to download the detailed reports. Example Email Content: Azure automated cost control system: - Increase of €200 in Virtual Machines. - Reduction of €50 in Storage. Download details: - Current month: [Download Report] - Previous month: [Download Report]. Action: Send_an_email_(V2) Sends the composed email. Components Azure OpenAI service is a platform provided by Microsoft that offers access to powerful language models developed by OpenAI, including GPT-4, GPT-4o, GPT-4o mini, and others. The service is used in this scenario for all the natural language understanding and generating communication to the customers. Azure Logic Apps is a cloud platform where you can create and run automated workflows with little to no code. Azure Logic Apps Managed Identities allow to authenticate to any resource that supports Microsoft Entra authentication, including your own applications. Potential use cases Azure Cost Management AI Enhanced is an example of a solution designed for those who need to programme the generation of reports related to FinOps topics with the possibility to customise the output and send the results via e-mail or perform a customised upload. Azure AI Anomalies Detection Architecture This LogApp solution leverages Azure Monitor's native machine learning capabilities to retrieve anomalous data within application logs. These will then be analysed by OpenAI. Dataflow Initial Trigger Type: Recurrence Trigger Frequency: Monthly Time Zone: W. Europe Standard Time Triggers the Logic App at the beginning of every month. Step 1: Initialize API Key Action: Initialize Variable Variable Name: Api-Key Step 2: Fetch Update Status Action: HTTP Request URI: https://management.azure.com/providers/Microsoft.ResourceGraph/resources Query: Retrieves resources related to patch assessments using patchassessmentresources. Step 3: Parse Update Status Action: Parse JSON Content: Response body from the HTTP request. Schema: Extracts details such as VM Name, Patch Name, Patch Properties, etc. Step 4: Process Updates For Each: @body('Parse_JSON')?['data'] Iterates through each item in the parsed update data. Condition: If Patch Name is not null and contains "KB": Action: Format Item Parses individual update items for VM Name, Patch Name, and additional properties. Action: Send to Azure OpenAI Description: Sends structured prompts to the GPT-4 model. Headers: Content-Type: application/json api-key: @variables('Api-Key') Body: Prompts Azure OpenAI to generate a report for each virtual machine and patch, formatted in Italian. Action: Parse OpenAI Response Extracts and formats the response generated by Azure OpenAI. Action: Append to Summary and CSV Adds the OpenAI-generated response to the Updated Summary array. Appends patch details to the CSV array. Step 5: Finalize Report Action: Create Reports (I, II, III) Formats and cleans the Updated Summary variable to remove unwanted characters. Action: Compose HTML Email Content Constructs an HTML email with the following: Report summary generated using OpenAI. Disclaimer about possible formatting anomalies. Company logo embedded. Step 6: Generate CSV Table Action: Converts the CSV array into a CSV format for attachment. Step 7: Send Notifications Action: Send Email Recipient: [email protected] Subject: Security Update Assessment Body: HTML content with report summary. Attachment: Name: SmartUpdate_<timestamp>.csv Content: CSV table of update details. Components Azure OpenAI service is a platform provided by Microsoft that offers access to powerful language models developed by OpenAI, including GPT-4, GPT-4o, GPT-4o mini, and others. The service is used in this scenario for all the natural language understanding and generating communication to the customers. Azure Logic Apps is a cloud platform where you can create and run automated workflows with little to no code. Azure Logic Apps Managed Identities allow to authenticate to any resource that supports Microsoft Entra authentication, including your own applications. Azure Monitor is a comprehensive monitoring solution for collecting, analyzing, and responding to monitoring data from your cloud and on-premises environments. Azure Kusto Queryis a powerful tool to explore your data and discover patterns, identify anomalies and outliers, create statistical modeling, and more Potential use cases Azure AI Anomalies Detection is an example of a solution that exploits the Machine Learning capabilities of Azure Monitor to diagnose anomalies within application logs that will then be analysed by Azure OpenAI. The solution can be customized based on Customer requirements. Azure OpenAI Smart Doc Creator Architecture This Function App solution leverages the Azure OpenAI LLM Generative AI to create a docx file based on the Azure architectural information of a specific workload (Azure Metadata based). The function exploits the 'OpenAI multi-agent' concept. Dataflow Step 1: Logging and Configuration Setup Initialize Logging: Advanced logging is set up to provide debug-level insights. Format includes timestamps, log levels, and messages. Retrieve OpenAI Endpoint: QUESTION_ENDPOINT is retrieved from environment variables. Logging confirms the endpoint retrieval. Step 2: Authentication Managed Identity Authentication: The ManagedIdentityCredential class is used for secure Azure authentication. The SubscriptionClient is initialized to access Azure subscriptions. Retrieves a token for Azure Cognitive Services (https://cognitiveservices.azure.com/.default). Step 3: Flattening Dictionaries Function: flatten_dict Transforms nested dictionaries into a flat structure. Handles nested lists and dictionaries recursively. Used for preparing metadata for storage in CSV. Step 4: Resource Tag Filtering Functions: get_resources_by_tag_in_subscription: Filters resources in a subscription based on a tag key and value. get_resource_groups_by_tag_in_subscription: Identifies resource groups with matching tags. Purpose: Retrieve Azure resources and resource groups tagged with specific key-value pairs. Step 5: Resource Metadata Retrieval Functions: get_all_resources: Aggregates resources and resource groups across all accessible subscriptions. get_resources_in_resource_group_in_subscription: Retrieves resources from specific resource groups. get_latest_api_version: Determines the most recent API version for a given resource type. get_resource_metadata: Retrieves detailed metadata for individual resources using the latest API version. Purpose: Collect comprehensive resource details for further processing. Step 6: Documentation Generation Function: generate_infra_config Processes metadata through OpenAI to generate documentation. OpenAI generates detailed and human-readable descriptions for Azure resources. Multi-stage review process: Initial draft by OpenAI. Feedback loop with ArchitecturalReviewer and DocCreator for refinement. Final content is saved to architecture.txt. Step 7: Workload Overview Function: generate_workload_overview Reads from the generated CSV file to create a summary of the workload. Sends resource list to OpenAI for generating a high-level overview. Step 8: Conversion to DOCX Function: txt_to_docx Creates a Word document (Output.docx) with: Section 1: "Workload Overview" (generated summary). Section 2: "Workload Details" (detailed resource metadata). Adds structured headings and page breaks. Step 9: Temporary Files Cleanup Function: cleanup_files Deletes temporary files: architecture.txt resources_with_expanded_metadata.csv Output.docx Ensures no residual files remain after execution. Step 10: CSV Metadata Export Function: save_resources_with_expanded_metadata_to_csv Aggregates and flattens resource metadata. Saves details to resources_with_expanded_metadata.csv. Includes unique keys derived from all metadata fields. Step 11: Architectural Review Process Functions: ArchitecturalReviewer: Reviews and suggests improvements to documentation. DocCreator: Incorporates reviewer suggestions into the documentation. Purpose: Iterative refinement for high-quality documentation. Step 12: HTTP Trigger Function Function: smartdocs Accepts HTTP requests with tag_key and tag_value parameters. Orchestrates the entire workflow: Resource discovery. Metadata retrieval. Documentation generation. File cleanup. Responds with success or error messages. Components Azure OpenAI service is a platform provided by Microsoft that offers access to powerful language models developed by OpenAI, including GPT-4, GPT-4o, GPT-4o mini, and others. The service is used in this scenario for all the natural language understanding and generating communication to the customers. Azure Functions is a serverless solution that allows you to write less code, maintain less infrastructure, and save on costs. Instead of worrying about deploying and maintaining servers, the cloud infrastructure provides all the up-to-date resources needed to keep your applications running. Azure Function App Managed Identities allow to authenticate to any resource that supports Microsoft Entra authentication, including your own applications. Azure libraries for Python (SDK) are the open-source Azure libraries for Python designed to simplify the provisioning, management and utilisation of Azure resources from Python application code. Potential use cases The Azure OpenAI Smart Doc Creator Function App, like all proposed solutions, can be modified to suit your needs. It can be of practical help when there is a need to obtain all the configurations, in terms of metadata, of the resources and services that make up a workload. Contributors Principal author: Tommaso Sacco | Cloud Solutions Architect Simone Verza | Cloud Solution Architect Extended Contribution: Saverio Lorenzini | Senior Cloud Solution Architect Andrea De Gregorio | Technical Specialist Gianluca De Rossi | Technical Specialist Special Thanks: Carmelo Ferrara | Director CSA Marco Crippa | Sr CSA Manager
TaleTom
Apr 09, 2025 Place Azure Architecture Blog
2.4KViews
4likes
3Comments
Advanced RAG Solution Accelerator
Overview What is RAG and Why Advanced RAG? Retrieval-Augmented Generation (RAG) is a natural language processing technique that combines the strengths of retrieval-based and generation-based models. It uses search algorithms to retrieve relevant data from external sources such as databases, knowledge bases, document corpora, and web pages. This retrieved data, known as "grounding information," is then input into a large language model (LLM) to generate more accurate, relevant, and up-to-date outputs. Figure1: High level Retrieval Augmented Flow Usage Patterns Here are some horizontal use cases where customers have used Retrieval Augmented Generation based systems: Conversational Search and Insights: Summarize large volumes of information for easier consumption and communication. Content Generation: Tailor interactions with individualized information to produce personalized output and recommendations. AI Assistant, Q&A, and Decisioning: Analyze and interpret data to uncover patterns, identify trends, gain valuable insights, and answer questions. Also, below are a few examples of vertical use cases where Retrieval Augmented Generation have been beneficial. Public Sector Knowledge Base: A government agency needs a system to provide citizens with information about public services, such as permits, licenses, and local regulations. Compliance Document Retrieval: A regulatory body must assist organizations in understanding compliance requirements through a database of guidelines and policies. Healthcare Patient Education: A health department aims to provide patients with educational resources about common conditions and treatments. Challenges with Baseline RAG: Ability to cover complex data: RAG on plain text content seems to do fine. However, when the content becomes more complex like financial reports with images and complex tables and document sections spanning pages, being able to parse and index them isn’t straightforward. Context Window Limitations: As the dataset scales, the performance of RAG systems can degrade, particularly due to the "lost in the middle" phenomenon, making it challenging to retrieve specific information from large datasets. Search Limitations: Though there have been advancements in Search technology to be able to perform vector-based searches, however searching over vector embedding alone may not be sufficient for achieving high accuracy. Groundedness: When the search context is not enough, sometimes RAG systems can generate incorrect or misleading information that is not grounded in the customer’s data. Careful evaluations may be necessary to catch these and fix them. Latency and User Experience: Balancing performance and latency is crucial, as high latency can negatively impact the user experience. Optimizing this balance is a key challenge. Quality Improvement Levers: Identifying and effectively utilizing the right levers for quality improvements, such as accuracy, latency, and relevance, can be difficult. Advanced RAG aims to address the challenges faced with Baseline RAG by incorporating advanced techniques for ingestion, formatting, and intent extraction from both structured and unstructured data. It provides improved baseline architecture to build a more scalable solution that meets the accuracy and performance requirements of the business. By implementing advanced methodologies in data ingestion, search optimization, and evaluation, Advanced RAG enhances the overall effectiveness and reliability of Retrieval-Augmented Generation systems. This ensures that the business value of RAG systems is maximized, aligning technological capabilities with business needs. RAG Quality Improvement Background Our implementation uses default configurations from document indexing services to ingest financial data. We use Azure AI search for indexing it. The content was also vectorized in the index. The search index covered a few years of financial reports for the company. Once the RAG solution was implemented, overall accuracy was measured using the GPT similarity metric, which evaluates the similarity between user-provided ground truth answers and the model's predicted answers on a scale of 1 to 5, where 5 represents that the system produced answers that perfectly matched the ground truth answers. Accuracy Improvement Efforts To improve the accuracy of the Retrieval-Augmented Generation (RAG) system, several strategies were implemented that could be grouped under three different categories; Ingestion improvements, search improvements and improvements in tooling and evaluation. Ingestion Improvements: Improve Parsing: Efforts were made to minimize information loss during ingestion by handling data in images and complex tables. Image descriptions were generated, and various techniques were used to handle complex tables, including converting them into Markdown, HTML, and paragraph formats to ensure accurate parsing of tabular data. Information in images: The image below shows the performance of Microsoft Stock compared to the rest of the market (S&P 500 and NASDAQ). Efficient parsing techniques can eliminate the need for additional tables and supporting text content by extracting key insights from images and storing them in the form of text. Figure 2: Example of information in images Complex Tables: The image below shows an example of a table with financial data represented in a complex structure in the financial report. In this particular example, the table contains multiple sub columns (years) within a top-level column along with rows spanning over multiple lines. Figure 3: Example of a complex table in financial reports Optimal Chunk Size: The impact of chunk size on search results was analyzed. Parsed content was split into paragraphs, and a small percentage of these paragraphs were used to generate questions. Custom scripts created a question-to-paragraph mapping dataset. Different indexes with varying chunk sizes (e.g., 3k, 8k) were created, and search results were evaluated for different values of top_k. Recall Values with Different Chunk Sizes: The image and table below show recall values with different top_k search results on different indexes with different chunk sizes. For example, with a chunk size of 3k characters, the recall was 78.5% for top_k = 7 and 91% for top_k = 25. Figure 4: Recall on different chunk sizes Recall Chunk Size 8K (chars) 3K (chars) top_k = 7 69% 78.5% top_k = 25 76% 91% Table 1: Search Recall on different chunk sizes Based on the search recall value, for our content, it seems chunks of size 3K characters would work best for our content and we could use top_k of 25 to get most of the relevant search results Index Enrichment: Additional metadata was added to chunks during ingestion to aid in better retrieval. This included metadata in additional fields used during search (like headings, section topics etc.) and other fields used for filtering (report year etc.). Search Improvements: Pre-processing of User Input: Techniques such as rephrasing and query expansion were used to enhance the quality of user input. Advanced Search Features: The use of vector, semantic, and hybrid search features was implemented to increase the number of relevant results. Filtering and Reranking: Filters were dynamically extracted from user queries, and search results were reranked to improve relevance. Example of a rephrased user query: Below is an example of a user prompt that is then rephrased and fanned out into various smaller (more focused) search queries, produced from GPT-4o using a custom prompt User Query: Can you explain the difference between the gross profit for Microsoft in 2023 and 2024? Rephaser Output: Evaluation and Tooling: Standardizing Datasets: As the project progressed, soon we had too many datasets which started to result in inconsistent ways of measuring the quality of the bot response. To resolve that issue, we standardized our dataset and used AML to store, document and version them. When any updates were made to the dataset (say some inconsistency was found in the golden data set and that user prompt was ignored from accuracy computation the dataset was updated and a new version created). This way everyone was using a known dataset for evaluations. Standardizing Accuracy Calculation: To calculate accuracy of bot’s answers, similarity score is used, which is a rating between 1 to 5 based on how similar bot’s answer is to the golden dataset. Initially, the similarity score metric included in the Prompt Flow was used to calculate this, but soon we realized that from the produced scores it wasn’t easy to understand why certain things were scored in a certain way. So, the team created its own prompt and calibrated it against how humans had done the evaluations. The tuned prompt was then used in prompt flow to run evaluations. The prompt, along with scoring the bot’s result also provides reason why it gave that score, which useful in analyzing the results. The following image shows a snippet of that prompt: Figure 5: Custom prompt for scoring bot response Automating Accuracy Calculations: Tools were also developed to automate the generation of predictions and the evaluation of accuracy in a more repeatable and consistent way. More details on Analysis can be found in the Eval Tool section Analyzing problematic queries: Running evaluations and just looking at the overall / average score wasn’t enough to analyze the cause of the issue. So, we took a first pass at categorizing the user queries into certain buckets. These categories became: Queries that are direct hit on some content in the report – like revenue for a year Queries where we need to perform some calculations - like Gearing Ratio Queries that do compare and contrast across some KPI – largest two segments by revenue Open ended queries where we need to perform analysis – like why and what? Later, LLM was leveraged to auto categorize ground truth questions as more ground truth questions were updated. Once the questions were categorized, evaluations were broken down by these categories to ease analysis and understanding problematic queries. Figure below shows a snapshot of the spread of the user queries in the ground truth by the following categories: Figure 6 Spread of user prompt by categories The figure below shows a snapshot of similarity scores across these categories Figure 7 Average similarity score by category Later, another category was added (difficulty level) with below values. The final results were reported across these categories. o Easy: If the search context had the answer user was looking for o Medium: If there was no direct hit and required some calculations to get to the final result o Hard: If the question required some analysis to be performed on the retrieved/calculated data like a financial analyst does. Results after the Accuracy Improvement Efforts After multiple iterations of accuracy improvement efforts and stabilizing the solution the overall accuracy of the system came around 4.3, making the overall solution more acceptable to the end user. The solution was also scaled up to cover content across multiple years, with over 15 financial reports and roughly 1300 pages in total. Another important metric to consider – the pass rate based on question type (% of time some answers were scored a value of 4 or a 5) to ensure the copilot was consistently passing these ground truths. The table below lists the pass rate by difficulty: Pass rate by difficulty Difficulty Easy Medium Hard Pass Rate 95 79 72 Table 2: Accuracy Improvement Analysis by Difficulty Solution architecture The RAG solution is designed to handle various tasks using robust and scalable architecture. The architecture includes the following key aspects: Security User Authentication: The solution uses Microsoft Entra ID for user authentication, ensuring secure access to the system. Network Security: All runtime components are locked behind a Virtual Network (VNet) to ensure that traffic does not traverse public networks. This enhances security by isolating the components from external threats. Managed Identities: The solution leverages managed identities where possible to simplify the management of secrets and credentials. This reduces the risk of credential exposure and makes it easier to manage access to Azure resources. Composability Modular Design: The solution is broken down into smaller, well-defined core microservices and skills that act as plug-and-play components. This modular design allows you to use existing services or bring in new ones to meet your specific needs. Core Microservices: Backend services handle different aspects of the solution, such as session management, data processing, runtime configuration, and orchestration. Skills: Specialized services provide specific capabilities, such as cognitive search and image processing. These skills can be easily integrated or replaced as needed. Iterability Configuration Service: The solution includes a configuration service that allows you to create runtime configurations for each microservice. This enables you to make changes, such as updating prompts or search indexes, without redeploying the entire solution. Per-User Prompt Configuration: Configuration service can be used to apply different configurations for each user prompt, allowing for rapid experimentation and iteration. This flexibility helps to quickly adapt to changing requirements and improve the overall system. Testing and Evaluation: The solution also comes with the ability to run dummy/simulated conversations in the form of nightly runs, end-to-end integration tests on demand, and an evaluation tool to perform end-to-end evaluation of the solution. Logging and Instrumentation Application Insights: The solution integrates with Azure Application Insights in Azure Monitor for logging and instrumentation, making it easy to debug by reviewing logs. Traceability: One can easily trace what is happening in the backend using the conversation_id and dialog_id (unique GUIDs generated by the frontend) for each user session and interaction. This helps in identifying and resolving issues quickly. Figure 8: Solution Architecture Before exploring the data flow, we begin with the Ingestion process, crucial for preparing the solution. This involves creating and populating the search index with relevant content (corpus). Detailed instructions on parsing, chunking, and indexing can be found in the Solution capabilities section of the document. User Query Processing Flow User Authentication: Users interact with the bot via a web application and must authenticate using Microsoft Entra ID to ensure secure access. User Interaction: Once authenticated, users can submit requests through text or voice: The web app establishes a WebSocket connection with the backend session manager. For voice interactions, Microsoft Speech Services are utilized for live transcription. The web app requests a speech token from the backend, which is then used in the Speech SDK for transcription. Token Management: The backend retrieves secrets from Key Vault to generate tokens necessary for front end operations. Transcription and Submission: After transcription, the web app submits the transcribed text to the backend. Session Management: The session manager assigns unique connection IDs for each WebSocket connection to identify clients. User prompts are then pushed into a message queue, implemented using Azure Cache for Redis. Orchestrator: The orchestrator plays a critical role in managing the flow of information. It reads the user query from the message queue and performs several actions: Plan & Execute: It identifies the required actions based on the user query and context. Permissions: It checks user permissions using Role-Based Access Control (RBAC) or custom permissions on the content. NOTE: The current implementation doesn’t do it, however Orchestrator could easily be updated to do so. Invoke Actions: It triggers the appropriate actions, such as invoking the Azure AI Search for retrieving relevant information. Azure AI Search: The orchestrator interacts with Azure AI Search to query the unstructured knowledge base. This involves searching through financial reports or other content to find the information the user requested. Status & Response: The orchestrator processes the search results and formulates a response. It updates the queue with the status and the final response, which includes any necessary predictions or additional information. Session Manager: The response from the orchestrator is sent back to the session manager. This component is responsible for maintaining the session’s integrity and ensuring that each client receives the correct response. It uses the unique connection ID to route the response back to the appropriate client. Web App: The web app receives a response from the session manager. It then delivers the bot's response back to the user, completing the interaction cycle. This response can be in text and /or speech format, depending on the user's initial input method. Update History: On successful completion of bot response, the session manager updates the user profile and conversation history in the storage component. This includes details about user intents and entities, ensuring that the system can provide personalized and context-aware responses in future interactions. Developer Logs / Instrumentation: Throughout the process, logs and instrumentation data are collected. These logs are essential for monitoring and debugging the system, as well as for enhancing its performance and reliability. Evaluations and Quality Enhancements: The collected data along with golden datasets, manual feedback is utilized for ongoing evaluations and quality enhancements. Tools like Azure AI Foundry and VS Code along with the configuration service are used to test the bots, develop and evaluate different prompts and models. Monitoring and Reporting: The system is continuously monitored using Azure Monitor and other analytics tools. Power BI dashboards provide insights into system performance, user interactions, and other key metrics. This ensures that the solution remains responsive and effective over time. Solution capabilities The solution will support the following capabilities: Document Ingestion Pipeline Document ingestion in a Retrieval-Augmented Generation (RAG) application is a critical process that ensures efficient and accurate retrieval of information. Currently, the ingestion service supports the following scenarios: Large financial documents containing complex tables, graphs, charts and other figures Large retail product catalogs containing images and descriptions The overall process can be broken down into three primary stages: Document Loadin: The Document Loader is the first stage in the document ingestion pipeline. Its primary function is to load documents into memory and extract text and metadata. One can configure to use either Azure AI Document Intelligence service or LangChain with Azure AI Document Intelligence for text extraction. Document Parsing: Document Parser is the second stage in the document ingestion pipeline. Its role is to process the loaded text and metadata, splitting the document into manageable chunks and cleaning the text for indexing. One can use either a Fixed-size chunking with overlap or go with Layout-based chunking, where with the use of LLMs chunking is done based on whether certain paragraphs should be kept together. The solution used layout-based chunking and sections and subsections were extracted and maintained as metadata for the chunked paragraphs. Document Indexing: Document Indexer is the final stage in the document ingestion pipeline. Its purpose is to upload the parsed chunks into a search index, enabling efficient retrieval based on user queries. As part of document parsing additional metadata (section and subsection names and titles) are also passed along with the text to be indexed. Main content and certain metadata fields are also stored as vectors to enable better retrieval. Figure 9: Indexing by document Search Once the Ingestion pipeline is executed successfully resulting in a valid, queryable Search Index, the Search service can be configured and integrated into the end-to-end RAG application. The Search Service exposes an API that enables users to query a search index in Azure AI Search. It processes natural language queries, applies requested filters, and invokes search requests against the preconfigured search configuration using the Azure AI Search SDK. Search Index Configuration: The search index configuration defines the schema and the type of search to apply, including simple text search, vector search, hybrid search, and hybrid with additional semantic understanding. This is done as part of index creation and document ingestion. User Query: The process starts with a user query, a natural language input from the user. Query Embeddings Generation: Using an LLM, the query is vectorized so hybrid search could be performed on the user query. Search Filter Generation: From the user query, filters, based on criteria such as equality, range conditions, and substring matches, are generated to refine the search results. Search Invocation: The search service constructs a query using the embedding and filters, sends it to Azure AI Search via the Azure AI Search SDK, and receives the search results. Pruning: Pruning refines these results further to ensure relevance based on additional semantic filtering and ranking. Search Results: The final output represents the items from the search index that best match the user’s query, after all filters and pruning have been applied. Query Reprocessing One of the first steps we approach when we receive a chat message is preprocessing to make sure that we have better search results that will enable our RAG system to answer the question accurately. We perform the following steps as part of the preprocessing: Message Rephrasing: When the chatbot receives a new message, we need to make sure that we rephrase this message based on the chat history as this new message may depend on the previous context. For example, when we ask, “Which team won the Premier League in 2023?” and then we ask a follow-up question “What about the following year?” we will need to rephrase this follow-up question to “Which team won the premier League in 2024?” Fanout: If the query is asking about complex data that does not exist in the indexed documents, it can be calculated by other simpler data that already exists in the document. For example, if the indexed documents are financial reports and the query is asking about the gross profit margin, if we search for gross profit margin, we may not find it in the indexed documents. But to calculate the gross profit margin, we can use both the Revenue and the Cost Of Goods (COGS) which exist in the indexed documents. If we can break down the original question about gross profit margin to sub questions for Revenue and COGS, then that would help the model to calculate the gross profit margin given these values. Check out the new service Rewrite queries with semantic ranker in Azure AI Search (Preview). AI Skills To ensure modularity and ease of maintenance, our solution designates any service capable of providing data as a "skill." This approach allows for seamless plug-and-play integration of components. For instance, Azure AI Search is treated as a skill within our architecture. Should the solution require additional data sources, such as a relational database, these can be encapsulated within an API and similarly integrated as skills. Wrapping content providers as skills serves two primary purposes: Enhanced Logging and Debugging: Skills can be configured to incorporate logging and instrumentation fields, ensuring that all generated logs include relevant context. This uniformity greatly facilitates efficient debugging by providing comprehensive log insights. Dynamic Configuration: Skills can leverage the configuration service to expose runtime configurations. This flexibility is particularly beneficial during evaluations, allowing for adjustments such as modifying the number of top-k results or switching to a different search index to accommodate improvements in data ingestion. By adopting this skill-based approach, the architecture remains adaptable and scalable, supporting ongoing enhancements and diverse data integration. Sharing Intermediate Results Sharing intermediate results from the RAG process provides the user with details about what is happening once a query is sent to the bot. This is especially useful when the query takes a long time to return. This also helps to see how the query was broken down into smaller queries, so if something goes wrong (especially for harder queries), the user could have the ability to rephrase and get a better response. Once the user sends the query to the bot, the orchestrator emits intermediate updates like “Searching for ...”, “Retrieved XX results...” before the final answer is delivered. Figure 10: Messaging Framework Architecture to support this: WebSocket connection (Client <> Sessions Manager) - When the client connects to the session manager a persistent WebSocket connection is created, all communication between the client and session manager is handled through this connection. This also allows queueing up of multiple messages from the client. The session manager listens to the incoming messages and queues them up in a message queue. Then requests are handled one by one. Meanwhile, intermediate messages and final answers of the previously submitted messages are sent asynchronously back to the client. Message Queue (Session Manager <> Orchestrator) - once the session manager receives a request its enqueued into a task queue. Since there can be multiple orchestrator instances running in the cluster, the task queue ensures only one instance receives a particular request. The orchestrator then begins the RAG process. As the RAG process continues, the orchestrator sends intermediate messages by publishing them to a message queue. All instances of the session manager subscribe to this message queue. The instance handling the client relevant to the incoming message forwards it to the client. Runtime Configuration The runtime configuration service enhances the architecture's dynamicity and flexibility. It enables core services and AI skills to decouple and parameterize various components, such as prompts, search data settings, and operational parameters. These services can easily override default configurations with new versions at runtime, allowing for dynamic behavior adjustments during operation. Figure 11: Runtime Configuration Core Services and AI Skills: define unique identifiers for their individual configurations. At runtime, they check if the payload consists of a configuration override. If yes, they attempt to retrieve it from the cache. In a scenario where it is not present in cache memory, i.e., first time fetch, they read from the configuration service and save it in cache memory for future references. Configuration Service: facilitates Create, Read and Delete operations for a new configuration. Validates the incoming config against a Pydantic model and generates a unique version for the configuration upon successful save. Cosmos DB: persists the new config version. Redis: high availability memory store for storing and quick retrievals of configurations for subsequent queries. Evaluation Tool Improving accuracy of RAG based solution is a continuous process that requires experimenting with different changes, running predictions with those changes (running user query through the bot), evaluating the bot produced result against the ground truth and analyzing the issues and then repeating these steps again. All this required a consistent way of evaluating the end-to-end results. Initially the team did the evaluation and scoring of the results manually but as the search index grew (ingested a few thousand financial reports) and the golden dataset grew, doing it manually was very time-consuming. So, the team developed a custom prompt and used LLM to do the scoring. The prompt was calibrated against the human scores. Once the prompt was stabilized Evaluation tool was built to do two things: For each golden question call the bot endpoint and generate the prediction (bot answer) Then take the ground truth and predicted results to run evaluation of them and produce some metrics. Implementation Guide Please refer to GitHub repo. Additional Resources Get started on Azure AI Foundry Evaluation of generative AI applications Generate adversarial simulations for safety evaluation Generate synthetic data and simulate non-adversarial tasks AI architecture guidance to build AI workloads on Azure Responsible AI Tools and Practices
raniabayoumy
Mar 19, 2025 Place Azure Architecture Blog
3KViews
4likes
0Comments
Need inspirations? Real AI Apps stories by Azure customers to help you get started
In this blog, we present a tapestry of authentic stories from real Azure customers. You will read about how AI-empowered applications are revolutionizing enterprises and the myriad ways organizations choose to modernize their software, craft innovative experiences, and unveil new revenue streams. We hope that these stories inspire you to embark upon your own Azure AI journey. Before we begin, be sure to bookmark the newly unveiled Plan on Microsoft Learn—meticulously designed for developers and technical managers—to enhance your expertise on this subject. Inspiration 1: Transform customer service Intelligent apps today can offer a self-service natural language chat interface for customers to resolve service issues faster. They can route and divert calls, allowing agents to focus on the most complex cases. These solutions also enable customer service agents to quickly access contextual summaries of prior interactions offer real-time recommendations and generally enhance customer service productivity by automating repetitive tasks, such as logging interaction summaries. Prominent use cases across industries are self-service chatbots, the provision of real-time counsel to agents during customer engagements, the meticulous analysis and coaching of agents following each interaction, and the automation of summarizing customer dialogues. Below is a sample architecture for airline customer service and support. Azure Database for PostgresSQL. Azure Kubernetes Services hosts web UI and integrates with other components. In addition, this app uses RAG, with Azure AI Search as the retrieval system, and Azure OpenAI Service provides LLM capabilities, allowing customer service agents and customers to ask questions using natural language. Air India, the nation’s flagship carrier, updated its existing virtual assistant’s core natural language processing engine to the latest GPT models, using Azure OpenAI services. The new AI-based virtual assistant handles 97% of queries with full automation and saves millions of dollars on customer support costs. "We are on this mission of building a world-class airline with an Indian heart. To accomplish that goal, we are becoming an AI-infused company, and our collaboration with Microsoft is making that happen.” — Dr. Satya Ramaswamy, Chief Digital and Technology Officer, Air India In this customer case, the Azure-powered AI platform also supports Air India customers in other innovative ways. Travelers can save time by scanning visas and passports during web check-in, and then scan baggage tags to track their bags throughout their journeys. The platform’s voice recognition also enables analysis of live contact center conversations for quality assurance, training, and improvement. Inspiration #2: Personalize customer experience Organizations now can use AI models to present personalized content, products, or services to users based on multimodal user inputs from text, images, and speech, grounded on a deep understanding of their customer profiles. Common solutions we have seen include conversational shopping interfaces, image searches for products, product recommenders, and customized content delivery for each customer. In these cases, product discovery is improved through searching for data semantically, and as a result, personalized search and discovery improve engagement, customer satisfaction, and retention. Three areas are critical to consider when implementing such solutions. First, your development team should examine the ability to integrate multiple data types (e.g., user profiles, real-time inventory data, store sales data, and social data.) Second, during testing, ensure that pre-trained AI models can handle multi-modal inputs and can learn from user data to deliver personalized results. Lastly, your cloud administrator should implement scalability measures to meet variable user demands. ASOS, a global online fashion retailer, leveraged Azure AI Foundry to revolutionize its customer experience by creating an AI-powered virtual stylist that could engage with customers and help them discover new trends. "Having a conversational interface option gets us closer to our goals of fully engaging the customer and personalizing their experience by showing them the most relevant products at the most relevant time.” — Cliff Cohen, Chief Technology Officer, ASOS In this customer case, Azure AI Foundry enabled ASOS to rapidly develop and deploy their intelligent apps, integrating natural language processing and computer vision capabilities. Enabled ASOS to rapidly develop and deploy their intelligent app, integrating natural language processing and computer vision capabilities. This solution takes advantage of Azure’s ability to support cutting-edge AI applications in the retail sector, driving business growth and customer satisfaction. Inspiration #3: Accelerate product innovation Building customer-facing custom copilots has the promise to provide enhanced services to your customers. This is typically achieved through using AI to provide data-driven insights that facilitate personalized or unique customer interactions, to enable customer access to a wider range of information, while improving search queries and making data more accessible. You can check out a sample architecture for building your copilot below. d in near real-time by the AI agent. DocuSign, a leader in e-signature solutions with 1.6 million global customers, pioneered an entirely new category of agreement management designed to streamline workflows and created Docusign Intelligent Agreement Management (IAM). The IAM platform uses sophisticated multi-database architecture to efficiently manage various aspects of agreement processing and management. At the heart of the IAM platform is Azure AI, which automates manual tasks and processes agreements using machine learning models. "We needed to transform how businesses worked with a new platform. With Docusign Intelligent Agreement Management, built with Microsoft Azure, we help our customers create, commit to, manage, and act on agreements in real-time.” — Kunal Mukerjee, VP, Technology Strategy and Architecture, Docusign The workflow begins with agreement data stored in an Azure SQL Database and is then transferred through an ingestion pipeline to Navigator, an intelligent agreements repository. In addition, the Azure SQL Database Hyperscale service tier serves as the primary transactional engine, providing virtually unlimited storage capacity and the ability to scale compute and storage resources independently. Inspiration #4: Optimize employee workflows With AI-powered apps, businesses can organize unstructured data to streamline document management and information, leverage natural language processing to create a conversational search experience for employees, provide more contextual information to increase workplace productivity and summarize data for further analysis. Increasingly we have seen solutions such as employee chatbots for HR, professional services assistants (legal/tax/audit), analytics and reporting agents, contact center agent assistants, and employee self-service and knowledge management (IT) centers. It’s essential to note that adequate prompt engineering training can improve employee queries, and your team should examine the capability of integrating copilot with other internal workloads; lastly, make sure your organization implements continuous innovation and delivery mechanisms to support new internal resources and optimize chatbot dialogs. Improving the lives of clinicians and patients Medigold Health, one of the United Kingdom’s leading occupational health service providers, migrated applications to Azure OpenAI Service, with Azure Cosmos DB for logging and Azure SQL Database for data storage, achieving the automation of clinician processes, including report generation, leading to a 58% rise in clinician retention and greater job satisfaction. With Azure App Service, Medigold Health was also able to quickly and efficiently deploy and manage web applications, enhancing the company’s ability to respond to client and clinician needs. "We knew with Microsoft and moving our AI workloads to Azure, we’d get the expert support, plus scalability, security, performance, and resource optimization we needed.” — Alex Goldsmith, CEO, Medigold Health Inspiration #5: Prevent fraud and detect anomalies Increasingly, organizations leverage AI to identify suspicious financial transactions, false account chargebacks, fraudulent insurance claims, digital theft, unauthorized account access or account takeover, network intrusions or malware attacks, and false product or content reviews. If your company can use similar designs, take a glance at a sample architecture for building an interactive fraud analysis app below. Azure Cosmos DB. Transactional data is available for analytics in real-time (HTAP) using Synapse Link. All the other financial transactions such as stock trading data, claims, and other documents are integrated with Microsoft Fabric using Azure Data Factory. This setup allows analysts to see real-time fraud alerts on a custom dashboard. Generative AI denoted here uses RAG, with Azure OpenAI Service of the LLM, and Azure AI Search as the retrieval system. Fighting financial crimes in the gaming world Kinectify, an anti-money laundering (AML) risk management technology company, built its scalable, robust, Microsoft Azure-powered AML platform with a seamless combination of Azure Cosmos DB, Azure AI Services, Azure Kubernetes Service, and the broader capabilities of Azure cloud services. "We needed to choose a platform that provided best-in-class security and compliance due to the sensitive data we require and one that also offered best-in-class services as we didn’t want to be an infrastructure hosting company. We chose Azure because of its scalability, security, and the immense support it offers in terms of infrastructure management.” — Michael Calvin, CTO, Kinectify With the new solutions in place, Kinectify detects 43% more suspicious activities achieves 96% faster decisions, and continues to champion handling a high volume of transactions reliably and identifying patterns, anomalies, and suspicious activity. Inspiration #6: Unlock organizational knowledge We have seen companies building intelligent apps to surface insights from vast amounts of data and make it accessible through natural language interactions. Teams will be able to analyze conversations for keywords to spot trends and better understand your customers. Common use cases can include knowledge extraction and organization, trend and sentiment analysis, curation of content summarization, automated reports, and research generation. Below is a sample architecture for enterprise search and knowledge mining. H&R Block, the trusted tax preparation company, envisioned using generative AI to create an easy, seamless process that answers filers’ tax questions, maintains safeguards to ensure accuracy, and minimizes the time to file. Valuing Microsoft’s leadership in security and AI and the longstanding collaboration between the two companies, H&R Block selected Azure AI Foundry and Azure OpenAI Service to build a new solution on the H&R Block platform to provide real-time, reliable tax filing assistance. By building an intelligent app that automates the extraction of key data from tax documents, H&R Block reduced the time and manual effort involved in document handling. The AI-driven solution significantly increased accuracy while speeding up the overall tax preparation process. "We conduct about 25 percent of our annual business in a matter of days.” — Aditya Thadani, Vice President, H&R Block Through Azure’s intelligent services, H&R Block modernized its operations, improving both productivity and client service and classifying more than 30 million tax documents a year. The solution has allowed the company to handle more clients with greater efficiency, providing a faster, more accurate tax filing experience. Inspiration #7: Automate document processing Document intelligence through AI applications helps human counterparts classify, extract, summarize, and gain deeper insights with natural language prompts. When adopting this approach, organizations are recommended to also consider prioritizing the identification of tasks to be automated, and streamline employee access to historical data, as well as refine downstream workload to leverage summarized data. Here is a sample architecture for large document summarization. ents. Volve Group, one of the world’s leading manufacturers of trucks, buses, construction equipment, and marine and industrial engines, streamlined invoice and claims processing, saving over 10,000 manual hours with the help of Microsoft Azure AI services and Azure AI Document Intelligence. "We chose Microsoft Azure AI primarily because of the advanced capabilities offered, especially with AI Document Intelligence.” — Malladi Kumara Datta, RPA Product Owner, Volvo Group Since launch, the company has saved 10,000 manual hours—about 850-plus manual hours per month. Inspiration #8: Accelerate content delivery Using generative AI, your new applications can automate the creation of web or mobile content, such as product descriptions for online catalogs or visual campaign assets based on marketing narratives, accelerating time to market. It also helps you enable faster iteration and A/B testing to identify the best descriptions that resonate with customers. This pattern generates text or image content based on conversational user input. It combines the capabilities of Image Generation and Text Generation, and the content generated may be personalized to the user, data may be read from a variety of data sources, including Storage Account, Azure Cosmos DB, Azure Database for PostgreSQL, orAzure SQL. JATO Dynamics, a global supplier of automotive business intelligence operating in more than 50 countries, developed Sales Link with Azure OpenAI Service, which now helps dealerships quickly produce tailored content by combining market data and vehicle information, saving customers 32 hours per month. "Data processed through Azure OpenAI Service remains within Azure. This is critical for maintaining the privacy and security of dealer data and the trust of their customers.” — Derek Varner, Head of Software Engineering, JATO Dynamics In addition to Azure OpenAI, JATO Dynamics used Azure Cosmos DB to manage data from millions of transactions across 55 car brands. The database service also empowers scalability and quick access to vehicle and dealer transaction data, providing a reliable foundation for Sales Link. Closing thoughts From innovative solutions to heartwarming successes, it’s clear that a community of AI pioneers is transforming business and customer experiences. Let’s continue to push boundaries, embrace creativity, and celebrate every achievement along the way. Here’s to many more stories of success and innovation! Want to be certified as an Azure AI Engineer? Start preparing with this Microsoft Curated Learning Plan.
JoshuaHuang
Feb 27, 2025 Place Azure Architecture Blog
2.2KViews
3likes
3Comments
Transform Insurance Industry Workflows Using Generative AI Models and Azure Services
This article highlights an innovative automated solution designed to transform the processing of insurance claim forms for the insurance industry. Previously, underwriters were limited to handling just two to three claims per day, significantly hampering operational efficiency. With the implementation of this solution, companies have achieved a remarkable 60% increase in daily claim processing capacity. Built on Azure services, this architecture revolutionizes the management of claim forms submitted via email by automating critical tasks such as data extraction, classification, summarization, evaluation, and storage. Leveraging the power of AI and machine learning, this solution ensures faster, more accurate claim evaluations, enabling insurance companies to make informed decisions efficiently. The result is enhanced operational scalability, improved customer satisfaction, and a streamlined claims process. Scenario In the insurance industry, claim forms often arrive as email attachments, requiring manual processing to classify, extract, and validate information before it can be stored for analysis and reporting. This solution automates the process by leveraging Azure services to classify, extract, and summarize information from Insurance claim forms. Using Responsible AI evaluation, it ensures the performance of Large Language Models (LLMs) meets high standards. The data is then stored for further analysis and visualization in Power BI, where underwriters can access consumable reports. Architecture Diagram Components Azure Logic Apps: Automates workflows and integrates apps, data, and services. Used here to process emails, extract PDF attachments, and initiate workflows with an Outlook connector for attachment, metadata, and email content extraction. Azure Blob Storage: Stores unstructured data at scale. Used to save insurance claim forms in PDF and metadata/email content in TXT formats. Azure Functions: Serverless compute for event-driven code. Orchestrates workflows across services. Azure Document Intelligence: AI-powered data extraction from documents. Classifies and extracts structured content from ACCORD forms. Azure OpenAI: Provides advanced language models. Summarizes email content for high-level insights. LLM Evaluation Module (Azure AI SDK): Enhances Azure OpenAI summaries by evaluating and refining output quality. Azure AI Foundry: Manages Azure OpenAI deployments and evaluates LLM performance using Responsible AI metrics. Azure Cosmos DB: Globally distributed NoSQL database. Stores JSON outputs from Azure OpenAI and Document Intelligence. Microsoft Power BI: Visualizes Cosmos DB data with interactive reports for underwriters. Workflow Description The workflow for processing claims efficiently leverages a series of Azure services to automate, structure, and analyze data, ensuring a fast, accurate, and scalable claims management system. 1. Email Processing with Azure Logic Apps The process begins with a pre-designed Azure Logic Apps workflow, which automates the intake of PDF claim forms received as email attachments from policyholders. By using prebuilt Outlook connectors, it extracts key details like sender information, email content, metadata, and attachments, organizing the data for smooth claims processing. This automation reduces manual effort, accelerates claim intake, and minimizes data capture errors. 2. Secure Data Storage in Azure Blob Storage Once emails are processed, the necessary PDF attachments, email content, and email metadata are stored securely in Azure Blob Storage. This centralized, scalable repository ensures easy access to raw claim data for subsequent processing. Azure Blob’s structured storage supports efficient file retrieval during later stages, while its scalability can handle growing claim volumes, ensuring data integrity and accessibility throughout the entire claims processing lifecycle. 3. Workflow Orchestration with Azure Functions The entire processing workflow is managed by Azure Functions, which orchestrates serverless tasks such as document classification, data extraction, summarization, and LLM evaluation. This modular architecture enables independent updates and optimizations, ensuring scalability and easier maintenance. Azure Functions streamlines operations, improving the overall efficiency of the claims processing system. a. Document Classification: The next step uses Azure Document Intelligence to classify documents with a custom pretrained model, identifying insurance claim forms. This step ensures the correct extraction methods are applied, reducing misclassification and errors, and eliminating much of the need for manual review. The ability to customize the model also adapts to changes in document formats, ensuring accuracy and efficiency in later processes. b. Content Extraction: Once the insurance form is properly classified, Azure Document Intelligence extracts specific data points from the PDF claim forms, such as claim numbers and policyholder details. The automated extraction process saves time, reduces manual data entry, and improves accuracy, ensuring essential data is available for downstream processing. This capability also helps in organizing the information for efficient claim tracking and report generation. c. Document Intelligence Output Processing: The results are extracted in JSON format and then parsed and organized for storage in Azure Cosmos DB, ensuring that all relevant data is systematically stored for future use. d. Summarizing Content with Azure OpenAI: Once data is extracted, Azure OpenAI generates summaries of email content, highlighting key claim submission details. These summaries make it easier for underwriters and decision-makers to quickly understand the essential points without sifting through extensive raw data. e. Quality Evaluation with LLM Evaluation SDK: After summarization, the quality of the generated content is evaluated using the LLM Evaluation Module in the Azure AI SDK. This evaluation ensures that the content meets accuracy and relevance standards, maintaining high-quality benchmarks and upholding responsible AI practices. Insights from this evaluation guide the refinement and improvement of models used in the workflow. f. LLM Performance Dashboard with Azure AI Foundry: Continuous monitoring of the workflow’s quality metrics is done via the evaluation dashboard in Azure AI Foundry. Key performance indicators like Groundedness, fluency, coherence, and relevance are tracked, ensuring high standards are maintained. This regular monitoring helps quickly identify performance issues and informs model optimizations, supporting the efficiency of the claims processing system. g. Summarization Output Processing: After evaluation, the results from the OpenAI summarization output are parsed and stored in Cosmos DB, ensuring that all relevant data is saved in a structured format for easy access and retrieval. 4. Storing Data in Azure Cosmos DB The structured data, including parsed JSON outputs and summaries, is stored in Azure Cosmos DB, a fully managed, globally distributed NoSQL database. This solution ensures processed claim data is easily accessible for further analysis and reporting. Cosmos DB’s scalability can accommodate increasing claim volumes, while its low-latency access makes it ideal for high-demand environments. Its flexible data model also allows seamless integration with other services and applications, improving the overall efficiency of the claims processing system. 5. Data Visualization with Microsoft Power BI The final step in the workflow involves visualizing the stored data using Microsoft Power BI. This powerful business analytics tool enables underwriters and other stakeholders to create interactive reports and dashboards, providing actionable insights from processed claim data. Power BI’s intuitive interface allows users to explore data in depth, facilitating quick, data-driven decisions. By incorporating Power BI, the insurance company can effectively leverage stored data to drive business outcomes and continuously improve the claims management process. Related Use cases: Healthcare - Patient Intake and Medical Claims Processing: Automating the extraction and processing of patient intake forms and medical claims for faster reimbursement and improved patient care analysis. See the following article for more information on how to implement a solution like this. Financial Services - Loan and Mortgage Application Processing: Streamlining loan application reviews by automatically extracting and summarizing financial data for quicker decision-making. Retail - Supplier Invoice and Purchase Order Processing: Automating invoice and purchase order processing for faster supplier payment approvals and improved financial tracking. Legal contract and Document Review: Automating the classification and extraction of key clauses from legal contracts to enhance compliance and reduce manual review time. See the following article for more information on how to implement a solution like this. Government - Tax Filing and Documentation Processing: Automating the classification and extraction of tax filing data to ensure compliance and improve audit efficiency. To find solution ideas and reference architectures for Azure based solutions curated by Microsoft, go to the Azure Architecture Center and search with keywords like “retail”, “legal”, “healthcare”, etc. You’ll find hundreds of industry-related solutions that can help jumpstart your design process. Contributors: This article is maintained by Microsoft. It was originally written by the following contributors.  Principal authors:  Manasa Ramalinga| Principal Cloud Solution Architect – US Customer Success  Oscar Shimabukuro Kiyan| Senior Cloud Solution Architect – US Customer Success 
manasa_ramalinga
Feb 26, 2025 Place Azure Architecture Blog
2KViews
2likes
1Comment
Azure AI Foundry, GitHub Copilot, Fabric and more to Analyze usage stats from Utility Invoices
Overview With the introduction of Azure AI Foundry, integrating various AI services to streamline AI solution development and deployment of Agentic AI Workflow solutions like multi-modal, multi-model, dynamic & interactive Agents etc. has become more efficient. The platform offers a range of AI services, including Document Intelligence for extracting data from documents, natural language processing and robust machine learning capabilities, and more. Microsoft Fabric further enhances this ecosystem by providing robust data storage, analytics, and data science tools, enabling seamless data management and analysis. Additionally, Copilot and GitHub Copilot assist developers by offering AI-powered code suggestions and automating repetitive coding tasks, significantly boosting productivity and efficiency. Objectives In this use case, we will use monthly electricity bills from the utilities' website for a year and analyze them using Azure AI services within Azure AI Foundry. The electricity bills is simply an easy start but we could apply it to any other format really. Like say, W-2, I-9, 1099, ISO, EHR etc. By leveraging the Foundry's workflow capabilities, we will streamline the development stages step by step. Initially, we will use Document Intelligence to extract key data such as usage in kilowatts (KW), billed consumption, and other necessary information from each PDF file. This data will then be stored in Microsoft Fabric, where we will utilize its analytics and data science capabilities to process and analyze the information. We will also include a bit of processing steps to include Azure Functions to utilize GitHub Copilot in VS Code. Finally, we will create a Power BI dashboard in Fabric to visually display the analysis, providing insights into electricity usage trends and billing patterns over the year. Utility Invoice sample Building the solution Depicted in the picture are the key Azure and Copilot Services we will use to build the solution. Set up Azure AI Foundry Create a new project in Azure AI Foundry. Add Document Intelligence to your project. You can do this directly within the Foundry portal. Extract documents through Doc Intel Download the PDF files of the power bills and upload them to Azure Blob storage. I used Document Intelligence Studio to create a new project and Train custom models using the files from the Blob storage. Next, in your Azure AI Foundry project, add the Document Intelligence resource by providing the Endpoint URL and Keys. Data Extraction Use Azure Document Intelligence to extract required information from the PDF files. From the resource page in the Doc Intel service in the portal, copy the Endpoint URL and Keys. We will need these to connect the application to the Document Intelligence API. Next, let’s integrate doc intel with the project. In the Azure AI Foundry project, add the Document Intelligence resource by providing the Endpoint URL and Keys. Configure the settings as needed to start using doc intel for extracting data from the PDF documents. We can stay within the Azure AI Foundry portal for most of these steps, but for more advanced configurations, we might need to use the Document Intelligence Studio. GitHub Copilot in VS Code for Azure Functions For processing portions of the output from Doc Intel, what better way to create the Azure Function than in VS Code, especially with the help of GitHub Copilot. Let’s start by installing the Azure Functions extension in VS Code, then create a new function project. GitHub Copilot can assist in writing the code to process the JSON received. Additionally, we can get Copilot to help generate unit tests to ensure the function works correctly. We could use Copilot to explain the code and the tests it generates. Finally, we seamlessly integrate the generated code and unit tests into the Functions app code file, all within VS Code. Notice how we can prompt GitHub Copilot from step 1 of Creating the Workspace to inserting the generated code into the Python file for the Azure Function to testing it and all the way to deploying the Function. Store and Analyze information in Fabric There are many options for storing and analyzing JSON data in Fabric. Lakehouse, Data Warehouse, SQL Database, Power BI Datamart. As our dataset is small, let’s choose either SQL DB or PBI Datamart. PBI Datamart is great for smaller datasets and direct integration with PBI for dashboarding while SQL DB is good for moderate data volumes and supports transactional & analytical workloads. To insert the JSON values derived in the Azure Functions App either called from Logic Apps or directly from the AI Foundry through the API calls into Fabric, let’s explore two approaches. Using REST API and the other Using Functions with Azure SQL DB. Using REST API – Fabric provides APIs that we can call directly from our Function to insert records using HTTP client in the Function’s Python code to send POST requests to the Fabric API endpoints with our JSON data. Using Functions with Azure SQL DB – we can connect it directly from our Function using the SQL client in the Function to execute SQL INSERT statements to add records to the database. While we are at it, we could even get GitHub Copilot to write up the Unit Tests. Here’s a sample: Visualization in Fabric Power BI Let's start with creating visualizations in Fabric using the web version of Power BI for our report, UtilitiesBillAnalysisDashboard. You could use the PBI Desktop version too. Open the PBI Service and navigate to the workspace where you want to create your report. Click on "New" and select "Dataset" to add a new data source. Choose "SQL Server" from the list of data sources and enter "UtilityBillsServer" as the server name and "UtilityBillsDB" as the DB name to establish the connection. Once connected, navigate to the Navigator pane where we can select the table "tblElectricity" and the columns. I’ve shown these in the pictures below. For a clustered column (or bar) chart, let us choose the columns that contain our categorical data (e.g., month, year) and numerical data (e.g., kWh usage, billed amounts). After loading the data into PBI, drag the desired fields into the Values and Axis areas of the clustered column chart visualization. Customize the chart by adjusting the formatting options to enhance readability and insights. We now visualize our data in PBI within Fabric. We may need to do custom sort of the Month column. Let’s do this in the Data view. Select the table and create a new column with the following formula. This will create a custom sort column that we will use as ‘Sum of MonthNumber’ in ascending order. Other visualizations possibilities: Other Possibilities Agents with Custom Copilot Studio Next, you could leverage a custom Copilot to provide personalized energy usage recommendations based on historical data. Start by integrating the Copilot with your existing data pipeline in Azure AI Foundry. The Copilot can analyze electricity consumption patterns stored in your Fabric SQL DB and use ML models to identify optimization opportunities. For instance, it could suggest energy-efficient appliances, optimal usage times, or tips to reduce consumption. These recommendations can be visualized in PBI where users can track progress over time. To implement this, you would need to set up an API endpoint for the Copilot to access the data, train the ML models using Python in VS Code (let GitHub Copilot help you here… you will love it), and deploy the models to Azure using CLI / PowerShell / Bicep / Terraform / ARM or the Azure portal. Finally, connect the Copilot to PBI to visualize the personalized recommendations. Additionally, you could explore using Azure AI Agents for automated anomaly detection and alerts. This agent could monitor electricity bill data for unusual patterns and send notifications when anomalies are detected. Yet another idea would be to implement predictive maintenance for electrical systems, where an AI agent uses predictive analytics to forecast maintenance needs based on the data collected, helping to reduce downtime and improve system reliability. Summary We have built a solution that leveraged the seamless integration of pioneering AI technologies with Microsoft’s end-to-end platform. By leveraging Azure AI Foundry, we have developed a solution that uses Document Intelligence to scan electricity bills, stores the data in Fabric SQL DB, and processes it with Python in Azure Functions in VS Code, assisted by GitHub Copilot. The resulting insights are visualized in Power BI within Fabric. Additionally, we explored potential enhancements using Azure AI Agents and Custom Copilots, showcasing the ease of implementation and the transformative possibilities. Finally, speaking of possibilities – With Gen AI, the only limit is our imagination! Additional resources Explore Azure AI Foundry Start using the Azure AI Foundry SDK Review the Azure AI Foundry documentation and Call Azure Logic Apps as functions using Azure OpenAI Assistants Take the Azure AI Learn courses Learn more about Azure AI Services Document Intelligence: Azure AI Doc Intel GitHub Copilot examples: What can GitHub Copilot do – Examples Explore Microsoft Fabric: Microsoft Fabric Documentation See what you can connect with Azure Logic Apps: Azure Logic Apps Connectors About the Author Pradyumna (Prad) Harish is a Technology leader in the GSI Partner Organization at Microsoft. He has 26 years of experience in Product Engineering, Partner Development, Presales, and Delivery. Responsible for revenue growth through Cloud, AI, Cognitive Services, ML, Data & Analytics, Integration, DevOps, Open Source Software, Enterprise Architecture, IoT, Digital strategies and other innovative areas for business generation and transformation; achieving revenue targets via extensive experience in managing global functions, global accounts, products, and solution architects across over 26 countries.
PradyH
Feb 25, 2025 Place Azure Architecture Blog
2.1KViews
4likes
1Comment
Securely Integrating Azure API Management with Azure OpenAI via Application Gateway
Introduction As organizations increasingly integrate AI into their applications, securing access to Azure OpenAI services becomes a critical priority. By default, Azure OpenAI can be exposed over the public internet, posing potential security risks. To mitigate these risks, enterprises often restrict OpenAI access using Private Endpoints, ensuring that traffic remains within their Azure Virtual Network (VNET) and preventing direct internet exposure. However, restricting OpenAI to a private endpoint introduces challenges when external applications, such as those hosted in AWS or on-premises environments, need to securely interact with OpenAI APIs. This is where Azure API Management (APIM) plays a crucial role. By deploying APIM within an internal VNET, it acts as a secure proxy between external applications and the OpenAI service, allowing controlled access while keeping OpenAI private. To further enhance security and accessibility, Azure Application Gateway (App Gateway) can be placed in front of APIM. This setup enables secure, policy-driven access by managing traffic flow, applying Web Application Firewall (WAF) rules, and enforcing SSL termination if needed. What This Blog Covers This blog provides a technical deep dive into setting up a fully secure architecture that integrates Azure OpenAI with APIM, Private Endpoints, and Application Gateway. Specifically, we will walk through: Configuring Azure OpenAI with a Private Endpoint to restrict public access and ensure communication remains within a secure network. Deploying APIM in an Internal VNET, allowing it to securely communicate with OpenAI while being inaccessible from the public internet. Setting up Application Gateway to expose APIM securely, allowing controlled external access with enhanced security. Configuring VNET, Subnets, and Network Security Groups (NSGs) to enforce network segmentation, traffic control, and security best practices. By the end of this guide, you will have a production-ready, enterprise-grade setup that ensures: End-to-end private connectivity for Azure OpenAI through APIM. Secure external access via Application Gateway while keeping OpenAI hidden from the internet. Granular network control using VNET, Subnets, and NSGs. This architecture provides a scalable and secure solution for enterprises needing to expose OpenAI securely without compromising privacy, performance, or compliance. Prerequisites Before diving into the integration of Azure API Management (APIM) with Azure OpenAI in a secure, private setup, ensure you have the following in place: 1. Azure Subscription & Required Permissions An active Azure Subscription with the ability to create resources. Contributor or Owner access to deploy Virtual Networks (VNETs), Subnets, Network Security Groups (NSGs), Private Endpoints, APIM, and Application Gateway. 2. Networking Setup Knowledge Familiarity with Azure Virtual Network (VNET) concepts, Subnets, and NSGs is helpful, as we will be designing a controlled network environment. 3. Required Azure Services The following services are needed for this integration: Azure Virtual Network (VNET) – To establish a private, secure network. Subnets & NSGs – For network segmentation and traffic control. Azure OpenAI Service – Deployed in a region that supports private endpoints. Azure API Management (APIM) – Deployed in an Internal VNET mode to act as a secure API proxy. Azure Private Endpoint – To restrict Azure OpenAI access to a private network. Azure Application Gateway – To expose APIM securely with load balancing and optional Web Application Firewall (WAF). 4. Networking and DNS Requirements Private DNS Zone: Required to resolve private endpoints within the VNET. Custom DNS Configuration: If using a custom DNS server, ensure proper forwarding rules are in place. Firewall/NSG Rules: Ensure necessary inbound and outbound rules allow communication between services. 5. Azure CLI or PowerShell (Optional, but Recommended) Azure CLI (az commands) or Azure PowerShell for efficient resource deployment. Once you have these prerequisites in place, we can proceed with designing the secure architecture for integrating Azure OpenAI with APIM using Private Endpoints and Application Gateway. Architecture Overview The architecture ensures secure and private connectivity between external users and Azure OpenAI while preventing direct public access to OpenAI’s APIs. It uses Azure API Management (APIM) in an Internal VNET, an Azure Private Endpoint for OpenAI, and an Application Gateway for controlled public exposure. Key Components & Flow User Requests External users access the API via a public endpoint exposed by Azure Application Gateway. The request passes through App Gateway before reaching APIM, ensuring security and traffic control. Azure API Management (APIM) – Internal VNET Mode APIM is deployed in Internal VNET mode, meaning it does not have a public endpoint. APIM serves as a proxy between external applications and Azure OpenAI, ensuring request validation, rate limiting, and security enforcement. The Management Plane of APIM still requires a public IP for admin operations, but the Data Plane (API traffic) remains fully private. Azure Private Endpoint for OpenAI APIM cannot access Azure OpenAI publicly since OpenAI is secured with a Private Endpoint. A Private Endpoint allows APIM to securely connect to Azure OpenAI within the same VNET, preventing internet exposure. This ensures that only APIM within the internal network can send requests to OpenAI. Managed Identity Authentication APIM uses a Managed Identity to authenticate securely with Azure OpenAI. This eliminates the need for hardcoded API keys and improves security by using Azure Role-Based Access Control (RBAC). Application Gateway for External Access Since APIM is not publicly accessible, an Azure Application Gateway (App Gateway) is placed in front of it. App Gateway acts as a reverse proxy that securely exposes APIM to the public while enforcing: SSL termination for secure HTTPS connections. Web Application Firewall (WAF) for protection against threats. Load balancing if multiple APIM instances exist. Network Segmentation & Security VNET & Subnets: APIM, OpenAI Private Endpoint, and App Gateway are deployed in separate subnets within an Azure Virtual Network (VNET). NSGs (Network Security Groups): Strict inbound and outbound rules ensure that only allowed traffic flows between components. Private DNS: Required to resolve Private Endpoint addresses inside the VNET. Security Enhancements No direct internet access to Azure OpenAI, ensuring full privacy. Controlled API exposure via App Gateway, securing public requests. Managed Identity for authentication, eliminating hardcoded credentials. Private Endpoint enforcement, blocking unwanted access from external sources. This architecture ensures that Azure OpenAI remains secure, APIM acts as a controlled gateway, and external users can access APIs safely through App Gateway. Azure CLI Script for VNet, Subnets, and NSG Configuration # Variables RESOURCE_GROUP="apim-openai-rg" LOCATION="eastus" VNET_NAME="apim-vnet" VNET_ADDRESS_PREFIX="10.0.0.0/16" # Subnets APP_GATEWAY_SUBNET="app-gateway-subnet" APP_GATEWAY_SUBNET_PREFIX="10.0.1.0/24" APIM_SUBNET="apim-subnet" APIM_SUBNET_PREFIX="10.0.2.0/24" OPENAI_PE_SUBNET="openai-pe-subnet" OPENAI_PE_SUBNET_PREFIX="10.0.3.0/24" # NSGs APP_GATEWAY_NSG="app-gateway-nsg" APIM_NSG="apim-nsg" OPENAI_PE_NSG="openai-pe-nsg" # Step 1: Create Resource Group az group create --name $RESOURCE_GROUP --location $LOCATION # Step 2: Create Virtual Network az network vnet create \ --resource-group $RESOURCE_GROUP \ --name $VNET_NAME \ --address-prefix $VNET_ADDRESS_PREFIX \ --subnet-name $APP_GATEWAY_SUBNET \ --subnet-prefix $APP_GATEWAY_SUBNET_PREFIX # Step 3: Create Additional Subnets (APIM & OpenAI Private Endpoint) az network vnet subnet create \ --resource-group $RESOURCE_GROUP \ --vnet-name $VNET_NAME \ --name $APIM_SUBNET \ --address-prefix $APIM_SUBNET_PREFIX az network vnet subnet create \ --resource-group $RESOURCE_GROUP \ --vnet-name $VNET_NAME \ --name $OPENAI_PE_SUBNET \ --address-prefix $OPENAI_PE_SUBNET_PREFIX # Step 4: Create NSGs az network nsg create --resource-group $RESOURCE_GROUP --name $APP_GATEWAY_NSG az network nsg create --resource-group $RESOURCE_GROUP --name $APIM_NSG az network nsg create --resource-group $RESOURCE_GROUP --name $OPENAI_PE_NSG # Step 5: Add NSG Rules for APIM (Allow 3443 for APIM Internal VNet) az network nsg rule create \ --resource-group $RESOURCE_GROUP \ --nsg-name $APIM_NSG \ --name AllowAPIMInbound3443 \ --priority 120 \ --direction Inbound \ --access Allow \ --protocol Tcp \ --source-address-prefixes ApiManagement \ --destination-address-prefixes VirtualNetwork \ --destination-port-ranges 3443 # Step 6: Associate NSGs with Subnets az network vnet subnet update \ --resource-group $RESOURCE_GROUP \ --vnet-name $VNET_NAME \ --name $APP_GATEWAY_SUBNET \ --network-security-group $APP_GATEWAY_NSG az network vnet subnet update \ --resource-group $RESOURCE_GROUP \ --vnet-name $VNET_NAME \ --name $APIM_SUBNET \ --network-security-group $APIM_NSG az network vnet subnet update \ --resource-group $RESOURCE_GROUP \ --vnet-name $VNET_NAME \ --name $OPENAI_PE_SUBNET \ --network-security-group $OPENAI_PE_NSG # Step 7: Configure Service Endpoints for APIM Subnet az network vnet subnet update \ --resource-group $RESOURCE_GROUP \ --vnet-name $VNET_NAME \ --name $APIM_SUBNET \ --service-endpoints Microsoft.EventHub Microsoft.KeyVault Microsoft.ServiceBus Microsoft.Sql Microsoft.Storage Microsoft.AzureActiveDirectory Microsoft.CognitiveServices Microsoft.Web Creating an Azure Open AI with private endpoint # Create an Azure OpenAI Resource az cognitiveservices account create \ --name $AOAI_NAME \ --resource-group $RESOURCE_GROUP \ --kind OpenAI \ --sku S0 \ --location $LOCATION \ --yes \ --custom-domain $AOAI_NAME #Create a Private Endpoint az network private-endpoint create \ --name $PRIVATE_ENDPOINT_NAME \ --resource-group $RESOURCE_GROUP \ --vnet-name $VNET_NAME \ --subnet $SUBNET_NAME \ --private-connection-resource-id $(az cognitiveservices account show --name $AOAI_NAME --resource-group $RESOURCE_GROUP --query id -o tsv) \ --group-id account \ --connection-name "${PRIVATE_ENDPOINT_NAME}-connection" # Create a Private DNS Zone az network private-dns zone create \ --resource-group $RESOURCE_GROUP \ --name $PRIVATE_DNS_ZONE_NAME # Link Private DNS Zone to VNet az network private-dns link vnet create \ --resource-group $RESOURCE_GROUP \ --zone-name $PRIVATE_DNS_ZONE_NAME \ --name "myDNSLink" \ --virtual-network $VNET_NAME \ --registration-enabled false # Retrieve the Private IP Address from the Private Endpoint PRIVATE_IP=$(az network private-endpoint show \ --name $PRIVATE_ENDPOINT_NAME \ --resource-group $RESOURCE_GROUP \ --query "customDnsConfigs[0].ipAddresses[0]" -o tsv) # Create a DNS Record for Azure OpenAI az network private-dns record-set a add-record \ --resource-group $RESOURCE_GROUP \ --zone-name $PRIVATE_DNS_ZONE_NAME \ --record-set-name $AOAI_NAME \ --ipv4-address $PRIVATE_IP # Disable Public Network Access az cognitiveservices account update \ --name $AOAI_NAME \ --resource-group $RESOURCE_GROUP \ --public-network-access Disabled Provisioning the Azure APIM instance to an internal VNet Please follow the link to provision: Deploy Azure API Management instance to internal VNet | Microsoft Learn Create an API for AOAI in APIM Please follow the link : Import an Azure OpenAI API as REST API - Azure API Management | Microsoft Learn Configure Azure Application Gateway with Azure APIM Please follow the link : Use API Management in a virtual network with Azure Application Gateway - Azure API Management | Microsoft Learn Conclusion Securing Azure OpenAI with private endpoints, APIM, and Application Gateway ensures a robust, enterprise-grade architecture that balances security, accessibility, and performance. By leveraging private endpoints, Azure OpenAI remains shielded from public exposure, while APIM acts as a controlled gateway for managing external API access. The addition of Application Gateway provides an extra security layer with SSL termination, WAF protection, and traffic management. With this setup, organizations can: ✔ Ensure end-to-end private connectivity for Azure OpenAI. ✔ Enable secure external access via APIM and Application Gateway. ✔ Enforce strict network segmentation with VNETs, Subnets, NSGs, and Private DNS. ✔ Strengthen security with Managed Identity authentication and controlled API exposure. By following this guide, you now have a scalable, production-ready solution to securely integrate Azure OpenAI with external applications, whether they reside in AWS, on-premises, or other cloud environments. Implement these best practices to maintain compliance, minimize security risks, and enhance the reliability of your AI-powered applications.
Sabyasachi-Samaddar
Feb 25, 2025 Place Azure Architecture Blog
1.8KViews
3likes
0Comments
Getting started with the NetApp Connector for Microsoft M365 Copilot and Azure NetApp Files
Imagine a world where your on-premises and enterprise cloud files seamlessly integrate with Microsoft Copilot unleashing AI on your Azure NetApp Files enterprise data, and making your workday smoother and more efficient. Welcome to the future with the NetApp Connector for Microsoft Copilot!
GeertVanTeylingen
Feb 19, 2025 Place Azure Architecture Blog
2.2KViews
1like
0Comments
Demystifying Azure OpenAI Networking for Secure Chatbot Deployment
Embark on a technical exploration of Azure's networking features for building secure chatbots. In this article, we'll dive deep into the practical aspects of Azure's networking capabilities and their crucial role in ensuring the security of your OpenAI deployments. With real-world use cases and step-by-step instructions, you'll gain practical insights into optimizing Azure and OpenAI for your projects.
FreddyAyala
Feb 12, 2025 Place Azure Architecture Blog
27KViews
7likes
9Comments
Building scalable and persistent AI applications with LangChain, Instaclustr, and Azure NetApp Files
Discover the powerful combination of LangChain and LangGraph for building stateful AI applications and unlock the benefits of using a managed-database service like NetApp® Instaclustr® backed by Azure NetApp Files for seamless data persistence and scalability.
GeertVanTeylingen
Dec 12, 2024 Place Azure Architecture Blog
1.7KViews
0likes
0Comments

	
		OSZAR »