Capitalizing on the surging interest in generative AI and LLMs (Large Language Models), machine learning monitoring startup, Arthur, has introduced "Arthur Bench." This open-source platform is devised to assist businesses in selecting the most suitable LLM for their specific dataset.
Adam Wenchel, Arthur's CEO and co-founder, expressed that the post-ChatGPT landscape still lacks structured methods for companies to compare tool effectiveness. This void prompted the development of Arthur Bench. “Arthur Bench addresses a recurrent concern we've observed with our clientele - identifying the best-suited model for their unique application,” Wenchel explained to TechCrunch.
The tool's value proposition is multifaceted. While it includes a comprehensive toolkit for performance assessment, its standout feature allows users to evaluate how varied prompts, likely to be used in their applications, fare against different LLMs. Wenchel highlighted, “Users can test numerous prompts, juxtaposing LLMs like Anthropic and OpenAI, to deduce the best fit for their application.”
Today marks the release of "Arthur Bench" in its open-source format. A Software-as-a-Service (SaaS) iteration will soon follow, catering to clients seeking a managed version without the intricacies of the open-source setup, or those with extensive testing prerequisites.
This unveiling is strategically timed, following the launch of "Arthur Shield" earlier in May. This innovative solution acts as an LLM firewall, engineered to identify model hallucinations while ensuring protection against harmful content and potential data breaches.