Microsoft
Prompt Flow in AzureML
A holistic platform for the entire AI development lifecycle, and I participated in the project from concept to implementation, building it from scratch
*The Prompt Flow has successfully reached the public preview stage, and there are no remaining confidential issues
The challenge
One Platform for the Entire LLM Development Lifecycle
Customers will look for an end-to-end solution for the entire LLM lifecycle, including data access, prompt engineering, experimentation management, deployment, monitoring, human feedback loop, retraining/re-tuning, and RAI. While existing tools, such as langchain, dust.tt, vellum, etc., only address some aspects of the LLM development lifecycle. By offering an integrated, end-to-end experience, AzureML can easily stand out compared to any individual tool.
What is Prompt Flow?
As a term:
To put it simply, prompt flow is a workflow that is integrated by multiple large language models (LLM), which allows users to complete more complicated or specific tasks.
As a product:
Prompt Flow is a prompt engineering tool to significantly boost the productivity of LLM-based application development.
It is an integrated development environment that covers the whole lifecycle of AI skills development and deployment, which can make testing, debugging, and shipping easier.
Target user
App developers and prompt engineers
My role
As a product designer, I had the privilege of participating in the 'Prompt Flow' project from February to August 2023, collaborating closely with other product designers, product managers, and developers. In this role, I took the lead in designing Prompt Flow from its inception, conducting extensive competitor research, crafting low to high-fidelity wireframes, and delivering the final UI design. This comprehensive approach ensured the project's success and its ability to bridge the gap in AI development.
Enable Prompt Flow in both VS code extension & AzureML by //Build
Competivie landscape
Understanding Prompt Engineering domain and industry standards
Before embarking on the design process, I conducted a thorough exploration of two prominent categories of prompt engineering tools. This research aimed to deepen my understanding of current product knowledge and inform my design decisions.
Prompt Engineering Playground
The first category, known as "Playgrounds", provides a user-friendly and interactive environment for experimenting with prompt generation. These tools prioritize ease of use and often come with prebuilt models, making them accessible to a broad range of users.
Examples: Open AI's GPT-3 playground
Google Vertex AI
Integrated Development Environments (IDEs)
offers a more robust and versatile approach. These tools cater to professionals with coding expertise and provide advanced functionalities for fine-tuning and optimizing prompt flows.
Examples: CDEX (Internal tool used by MS)
PROMPTMETHEUS FORGE
As we know these 2 types of prompt engineering tools, we started to play with 15+ prompt engineering products and summarized the insights from different aspects such as features, user flow, and UI pattern.
Key takeaways: What Our Customers Need
-
Easy onboarding with a template
-
Visual tools with direct manipulation
-
Result comparison, and bulk data evaluation
-
Flexible data input and parameter control
We shared the insights with PM to help them shape our products in the early stage. And it also helped us learn the domain knowledge in a very short time.
Explore
Main User Workflow
As Prompt Flow serves as an entire platform for the AI development lifecycle, our initial discussions with product managers focused on establishing a high-level overview. This foundational step ensured that our detailed design efforts remained aligned with the project's overarching goals and prevented us from veering off course. Furthermore, these discussions provided valuable insights into how users would interact with the product, deepening my understanding of its logical flow.
1. Initialization: Getting started with ideas --- hypothesis generation
2. Experimentation: Development of prompts & flow
3. Evaluation & Refinement: Evaluate & refine with larger dataset
4. Production: Optimization and deployment of the flow, as well as monitoring
The challenge
Design Based on Users
As a UX designer, my goal was to create a tool that would be accessible and user-friendly for a wide range of users, including those who may not have extensive coding experience. Through user research, we identified a new role in the app development process that we call "prompt engineers", who may be developers or product managers but may not have a strong coding background.
To cater to this diverse user base, we have made the decision to provide both Directed Acyclic Graphs (DAGs) and stacked cells, enabling all user types to construct prompt flows with ease.
DAG
-
Can easily see the topology of the flow
-
Easy to navigate among steps
-
Can easily reference input/output between modules
-
Get a sense of familiarity with AzureML pipeline
Stacked cell
-
Easy to edit prompt/code
-
Get a sense of familiarity with CDEX and open AI playground
-
Able to intuitively view all intermediate results of modules in a prompt flow, and view their input/output
A challenge we were facing was deciding how to support stacked cells and DAG at the same time. Our project manager and developers have different opinions on the best approach. To accelerate the decision-making process, I visualized four potential directions and listed the pros and cons of each. This will help us make an informed decision that considers all aspects of the problem.
Design
Authoring Page Interaction
We designed Prompt Flow in AzureML with a focus on usability, user-friendliness, and visual interfaces.
By providing:
-
A set of visual tools and a node view that allows users to drag and drop pre-built components and customize their settings.
-
Natural language testing allows prompt engineers to test, debug, and refine their prompts
-
Obvious error message to give users quick feedback when something goes wrong.
-
Offering collaboration and debugging tools enables prompt engineers to work together and troubleshoot issues effectively.
Authoring page: allow users to build, test and refine prompts
Debugging page in Authoring
Design
Detailed Interaction of Building a Prompt Flow
In this section, I will introduce detailed interactions that can help solve the problems of prompt engineers in authoring prompt flows using Prompt Flow in Azure ML.
1. Create a new flow
Allow users to create flow by selecting pre-built template to help accelerate the efficiency of Prompt engineers.
2. Visual way to show prompt flow
DAG view can easily reflect the flow orders of existing nodes in a flattened view. The tools bar at the top allows users to quickly insert tools like LLM, Prompt, and Python.
3. Add and set a node
after adding an LLM node, users can set parameters, and edit prompt in the selected node.
4. Link nodes
Users can link each node by editing their inputs, this can avoid confusing users if there are multiple inputs in one node.
5. Test prompt in chat box and debugging
In chat flow, users can test prompt by using natural language in the chat mode, which can easily help prompt engineers understand the performance of the chatbot and improve the user experience
Run failed notification
6. Collaborate with team members in one workspace
In the workspace, customers can see other teammates' prompt flow, duplicate them as a reference, start as a template, and help debug.
Design
Empowering Developers
Code first experience
During interviews with our internal development team, we discovered that some prompt engineers have a solid programming background. They are accustomed to utilizing coding techniques to create, modify, and enhance prompts. In response to this insight, we introduced the "Code-First Experience" within Prompt Flow, empowering developers to seamlessly incorporate their coding skills into the prompt design process."
Users can access individual node editing by clicking on the files located in the top right corner
Navigating Complexity and Precision
Within our machine learning platform, complexity and sensitivity prevail. A single error within this intricate process can trigger a complete flow failure. We want to provide users with greater fault tolerance space. To manage this challenge, our platform undergoes continuous updates informed by user feedback. These updates encompass edge case scenarios, enabling us to deliver precise error messaging and interactive guidance should a flow encounter difficulties, ensuring a smoother user experience.
Status indicating the success of each run will be displayed at the individual node
The system aids users in error identification through informative error messages
Design
Evaluation
Evaluation is an important step in AML prompt flow building. Users can assess the flow’s performance by running it against a larger dataset, evaluate the prompt’s effectiveness, and refine it as needed. Proceed to the next stage if the results meet the desired criteria.
Design Goal
Through the design of the evaluation feature, our aim is to create a comprehensive evaluation process for users. This process encompasses selecting variants, choosing evaluation methods, and seamlessly viewing metrics results, all within a fluent and user-friendly workflow
Design Goal
Through the design of the evaluation feature, our aim is to create a comprehensive evaluation process for users. This process encompasses selecting variants, choosing evaluation methods, and seamlessly viewing metrics results, all within a fluent and user-friendly workflow
Product Goal
Through these efforts, we aspire to establish Prompt Flow in AML as the industry's preferred choice for prompt evaluation, positioning it as the foremost prompt engineering platform across the entire industry
Challenge: Display evaluation step
One challenge we encountered involved presenting the evaluation step following the main flow's output. We devised a solution to address this: integrating a mini-map that divides the DAG view into two distinct sections— the main flow and the evaluation step. Once users have configured all steps, they can effortlessly initiate the evaluation process by clicking 'Bulk Test.
We draft our first version design on VS code in order to get quick feedback
Regrettably, during our testing with internal developers, we encountered unfavorable feedback.
The primary concern was that users found it challenging to grasp the requirement of configuring the evaluation step before initiating a bulk test. Additionally, the switch function proved to be overly complex and unwieldy for frequent use
Design
Evaluation: Workflow Exploration
In response to user feedback, I conducted an analysis of both the user flow and the current workflow within the product. This assessment allowed me to identify UX issues and set out to devise an improved solution for users to seamlessly initiate the evaluation step.
Several key insights guided my exploration of a potential new workflow:
-
Recognizing that the evaluation step essentially constitutes another main flow, effectively making it the "evaluation flow"
-
Understanding that the outputs generated during evaluation serve as the means to assess the results of the main flow's bulk run.
I crafted a novel solution aimed at streamlining the existing user flow. This enhancement simplifies the process of testing and evaluating the main flow in bulk, ensuring user-friendliness, and facilitates rapid access to two pivotal feature trigger points.
We conducted testing of the new solution with internal users and received very positive feedback. All five interviewees were able to configure bulk tests and evaluate them more efficiently compared to the previous design.
Design
Other Pages
Design based on
Fluent 2 Web UI
Design
AzureML Toolkit Based on Fluent 2 Web UI
I worked with other AzureML Design team members to create the AzureML Toolkit using Fluent 2 Web UI. It helped maintain a consistent design experience across different functions in AzureML.
Positive Feedback: AzureML Prompt Flow's Impact
Reflections on Prompt Flow Project Journey and Gratitude
It’s a great honor to be a member of the Prompt Flow team. In this rapidly advancing field, comprehending the entire prompt flow can be a significant challenge.
-
Collaboration with Corss-functional team: Worked closely with project managers and developers to identify workflows, user scenarios, and implementation feasibility.
-
Prototyping for Clarity: Created prototypes to illustrate how the product addressed user problems, aiding team comprehension.
-
Accelerated Iteration: Prototypes facilitated quicker and more effective iterations based on user feedback for PMs and designers.
-
Enhanced User-Centered Design: Clear product scope understanding allowed designers to refine designs with a user-centric focus.
-
Reduced Assumptions: Moving away from initial assumptions improved the early stages of product development.
Want to say thank you to all Microsoft AI platform members!