Skip to content

Gathering the most important evals, monitoring, test cases for the continuous development and improvement of agents with W&B Weave.

Notifications You must be signed in to change notification settings

NiWaRe/agent-dev-collection

Repository files navigation

Agent Dev Collection

This is a living repo gathering the most important evals, monitoring, test cases for the continuous development and improvement of agents. We'll add monitoring and evaluation tooling and standardized capability test cases (e.g. functional calling, agent communication) to a basic agent application.

Knowledge Worker Project

Check out the Weave Workspace here!

Getting Started

  1. Install requirements_verbose.txt in environment (for Mac Silicon)
  2. Setup benchmark.env in ./config with necessary API keys (WANDB_API_KEY) and optional (HUGGINGFACEHUB_API_TOKEN, OPENAI_API_KEY, ANTHROPIC_API_KEY)
  3. Set variables accordingly in general_config.yaml
    • Set Entity, Project (device for now only CPU)
    • Setup = True the first time to run to extract data and generate dataset
    • The chat model, embedding model, judge model, prompts, params as you want to!
  4. Run main.py with different configs or run streamlit run chatbot.py to track interactions with an already deployed model.

Code Structure

  • main.py - contains the main application flow - serves as an example for bringing everything together
  • setup.py - contains utility functions for the RAG model RagModel(weave.Model) and the data extraction and dataset generation functions
  • evaluatie.py - contains the weave.flow.scorer.Scorer classes to evaluate the correctness, hallucination, and retrieval performance.
  • ./configs - the configs of the project
    • ./configs/benchmark.env - should contain env vars for your W&B account and the model providers you want to use (HuggingFace, OpenAI, Anthropic, Mistral, etc.)
    • ./configs/requirements.txt - environment to install necessary dependencies to run RAG
    • ./configs/sources_urls.csv - a CSV to contain all the Websites and PDFs that should be considered by RAG
    • ./configs/general_config.yaml - the central config file with models, prompts, params
  • annotate.py - can be run with streamlit run annotation.py to annotate existing datasets or fetch datasets based on production function calls to annotate and save as new dataset.
  • chatbot.py - can be run with streamlit run chatbot.py to serve the RAG Model from Weave and track questions asked to it

About

Gathering the most important evals, monitoring, test cases for the continuous development and improvement of agents with W&B Weave.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages