us-technology Archives - Page 49 of 58

The best Nintendo Switch 2 accessories for 2025

Now that the Switch 2 is finally here, you may be equally as excited to kit out your new console with the right accessories as you are to dive into Mario Kart World. Not only can the right accessories make it easier and more fun to play all of the games you love, but they can also make your gaming experience better in different environments, be it on the couch, in an airplane or in the car. We’re excited to get our hands on some of the newest Switch 2 accessories (more on those below), but plenty of our favorite classics are compatible with the Switch 2 as well as older versions of the console. These are our current favorite Nintendo Switch 2 accessories, and we’ll add to this list over time as we test out new gear. The Switch 2 Pro controller is the best controller Nintendo has ever made – and that’s saying something. It’s incredibly comfortable to hold, its joysticks are buttery smooth, and all of its buttons are wonderfully responsive. Nintendo even made it customizable this time with rear buttons that can easily be mapped inside of any game. The only downside? The Switch 2 Pro Controller costs $85, which seems a bit egregious when you can get an excellent controller like 8BitDo’s Ultimate 2 wireless gamepad for $60. As good as the competition is, though, you won’t find anything that can be easily configured directly from within the Switch 2’s user interface. Despite the high cost, the Switch 2 Pro controller seriously elevated my Mario Kart World experience. My hands often feel cramped when I play that game with the Switch 2’s Joy-Con 2 controllers for too long. But there’s none of that with the Switch 2 Pro controller. I was able to race for an hour without any discomfort, and I also noticed that it was easier for me to pull off the game’s tricky jump maneuvers and rail sliding. The Switch 2 Pro controller is also seriously helpful with fighting games, thanks to its smooth and accurate directional pad. It acquits itself to most genres easily, though it would have been nice to see analog triggers for more precise control in serious racing games. — Devindra Hardawar, Senior Reporter $85 at Target Nintendo announced a bunch of new accessories when it revealed the Switch 2 earlier this year. Key among them are a new Switch 2 Pro controller, Switch 2 camera, an all-in-one carrying case and more. Our staff will be testing out a bunch of these accessories, and we’ll keep our favorites list up to date as we do so. If you’re interested in picking any of those new Switch 2 accessories up, you can find them at a variety of retailers: Read More

The best Nintendo Switch 2 accessories for 2025 Read More »

An engineer’s new smartphone cases can give any iPhone a USB-C port

us-technology

Ken Pillonel has a history of developing clever projects that add USB-C support to gadgets that have less common, outdated port types. After creating the first ever USB-C iPhone back in 2021, the engineer has returned his attention to that concept. He’s created an iPhone case that can provide older device models with a USB-C port, and you can browse the available options on his shop. He also detailed the design process in a fascinating video. For several generations, Apple equipped its smartphones with proprietary Lighting ports. That connection tech required a frankly obnoxious number of adapters and dongles to use. A solution like Pillonel’s can help keep those older devices functional in the present day, where USB-C has become the standard for most gadgets, including Apple’s. “The goal is to give some extra life to those older devices by making them feel less obsolete,” he explains in the video announcement. Pillonel has designed cases for all 20 phone models from the Lightning era that can run the current iOS. The design promises fast charging as well as full data transfers to both computers and CarPlay. He’s also adding more color options to be released in September. The video is a worthy watch for anyone interested in product design and engineering. And you can also read the backstory on other products Pillonel has tackled, including Apple’s AirPods and AirPods Max. Read More

An engineer’s new smartphone cases can give any iPhone a USB-C port Read More »

Maingear’s Retro95 PC blends ’90s workstation nostalgia with modern horsepower

us-technology

Maingear’s latest, the (appropriately named) Retro95, is a deceptive love letter to old-school “pizza box” PCs. It’s Wolfenstein 3D and Sierra adventure games on the outside; Cyberpunk 2077 in ray-traced 4K on the inside. That’s because you can fit this sucker with up to NVIDIA GeForce RTX 5080 graphics. It supports Intel and AMD processors, up to the Ryzen 7 9800X3D. You can also customize it with up to 96GB of DDR5 memory, 8TB of Gen4 NVMe storage, Noctua fans and an 850W PSU. It’s a ray-traced wolf in pixelated sheep’s clothing. Maingear It looks like the Retro95’s case is sourced from the Silverstone FLP01. (That makes sense since Maingear is a custom PC builder.) The case is an ode to beige horizontal PC cases, designed to serve as a pedestal for CRT monitors. They were the default from the early 1980s to mid-1990s. (If you prefer the tower design that succeeded it, Silverstone’s follow-up to the FLP01 should scratch that itch.) The Retro95 includes a hidden front-panel I/O array and modern airflow design. And if its exterior has you nostalgic for games you played on similar-looking PCs, you can add a DVD drive. (Who’s up for Carmen Sandiego?) Maingear “This one is for the gamers who lugged CRTs to LAN parties, swapped out disks between levels and got their gaming news from magazines,” Maingear CEO Wallace Santos wrote in a press release. “The Retro95 drop is our way of honoring the classic era of gaming, with a system that looks like the one you had as a kid but runs like the monster you’d spec from Maingear today.” Unfortunately, the Retro95 is a limited-edition run. Maingear says once it sells out, that’s game over. Given its high-powered hardware and special edition status, it’s no surprise that this PC ain’t cheap. It starts at $1,599. You can order one exclusively from Maingear’s website on July 23. Read More

Maingear’s Retro95 PC blends ’90s workstation nostalgia with modern horsepower Read More »

Former Anthropic exec raises $15M to insure AI agents and help startups deploy safely

us-technology

A new startup founded by an early Anthropic hire has raised $15 million to solve one of the most pressing challenges facing enterprises today: how to deploy artificial intelligence systems without risking catastrophic failures that could damage their businesses. The Artificial Intelligence Underwriting Company (AIUC), which launches publicly today, combines insurance coverage with rigorous safety standards and independent audits to give companies confidence in deploying AI agents — autonomous software systems that can perform complex tasks like customer service, coding, and data analysis. The seed funding round was led by Nat Friedman, former GitHub CEO, through his firm NFDG, with participation from Emergence Capital, Terrain, and several notable angel investors including Ben Mann, co-founder of Anthropic, and former chief information security officers at Google Cloud and MongoDB. “Enterprises are walking a tightrope,” said Rune Kvist, AIUC’s co-founder and CEO, in an interview. “On the one hand, you can stay on the sidelines and watch your competitors make you irrelevant, or you can lean in and risk making headlines for having your chatbot spew Nazi propaganda, or hallucinating your refund policy, or discriminating against the people you’re trying to recruit.” The AI Impact Series Returns to San Francisco – August 5 The next phase of AI is here – are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows – from real-time decision-making to end-to-end automation. Secure your spot now – space is limited: https://bit.ly/3GuuPLF The company’s approach tackles a fundamental trust gap that has emerged as AI capabilities rapidly advance. While AI systems can now perform tasks that rival human undergraduate-level reasoning, many enterprises remain hesitant to deploy them due to concerns about unpredictable failures, liability issues, and reputational risks. Creating security standards that move at AI speed AIUC’s solution centers on creating what Kvist calls “SOC 2 for AI agents” — a comprehensive security and risk framework specifically designed for artificial intelligence systems. SOC 2 is the widely-adopted cybersecurity standard that enterprises typically require from vendors before sharing sensitive data. “SOC 2 is a standard for cybersecurity that specifies all the best practices you must adopt in sufficient detail so that a third party can come and check whether a company meets those requirements,” Kvist explained. “But it doesn’t say anything about AI. There are tons of new questions like: how are you handling my training data? What about hallucinations? What about these tool calls?” The AIUC-1 standard addresses six key categories: safety, security, reliability, accountability, data privacy, and societal risks. The framework requires AI companies to implement specific safeguards, from monitoring systems to incident response plans, that can be independently verified through rigorous testing. “We take these agents and test them extensively, using customer support as an example since that’s easy to relate to. We try to get the system to say something racist, to give me a refund I don’t deserve, to give me a bigger refund than I deserve, to say something outrageous, or to leak another customer’s data. We do this thousands of times to get a real picture of how robust the AI agent actually is,” Kvist said. From Benjamin Franklin’s fire insurance to AI risk management The insurance-centered approach draws on centuries of precedent where private markets moved faster than regulation to enable the safe adoption of transformative technologies. Kvist frequently references Benjamin Franklin’s creation of America’s first fire insurance company in 1752, which led to building codes and fire inspections that tamed the blazes ravaging Philadelphia’s rapid growth. “Throughout history, insurance has been the right model for this, and the reason is that insurers have an incentive to tell the truth,” Kvist explained. “If they say the risks are bigger than they are, someone’s going to sell cheaper insurance. If they say the risks are smaller than they are, they’re going to have to pay the bill and go out of business.” The same pattern emerged with automobiles in the 20th century, when insurers created the Insurance Institute of Highway Safety and developed crash testing standards that incentivized safety features like airbags and seatbelts — years before government regulation mandated them. Major AI companies already using the new insurance model AIUC has already begun working with several high-profile AI companies to validate its approach. The company works with unicorn startups Ada (customer support) and Cognition (coding) to help unlock enterprise deployments that had been stalled due to trust concerns. “Ada, we help them unlock a deal with the top five social media company where we came in and ran independent tests on the risks that this company cared about, and that helped unlock that deal, basically giving them the confidence that this could actually be shown to their customers,” Kvist said. The startup is also developing partnerships with established insurance providers to provide the financial backing for policies. This addresses a key concern about trusting a startup with major liability coverage. “The insurance policies are going to be backed by the balance sheets of the big insurers,” Kvist explained. Quarterly updates vs. years-long regulatory cycles One of AIUC’s key innovations is designing standards that can keep pace with AI’s breakneck development speed. While traditional regulatory frameworks like the EU AI Act take years to develop and implement, AIUC plans to update its standards quarterly. “The EU AI Act was started back in 2021, they’re now about to release it, but they’re pausing it again because it’s too onerous four years later,” Kvist noted. “That cycle makes it very hard to get the legacy regulatory process to keep up with this technology.” This agility has become increasingly important as the competitive gap between US and Chinese AI capabilities narrows. “A year and a half ago, everyone would say, like, we’re two years ahead now, that sounds like eight months, something like that,” Kvist observed. How AI insurance actually works: testing systems to breaking point AIUC’s insurance policies cover various types of AI failures, from data breaches and discriminatory hiring practices

Former Anthropic exec raises $15M to insure AI agents and help startups deploy safely Read More »

Mixture-of-recursions delivers 2x faster inference—Here’s how to implement it

us-technology

July 22, 2025 5:05 PM Image credit: VentureBeat with Imagen 4 Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Researchers at KAIST AI and Mila have introduced a new Transformer architecture that makes large language models (LLMs) more memory- and compute-efficient. The architecture, called Mixture-of-Recursions (MoR), significantly improves model accuracy and delivers higher throughput compared with vanilla transformers, even when constrained by the same parameter count and compute budget. The scaling challenges of LLMs The impressive capabilities of today’s LLMs are directly tied to their ever-increasing size. But as these models scale, their memory footprints and computational requirements often become untenable, making both training and deployment challenging for organizations outside of hyperscale data centers. This has led to a search for more efficient designs. Efforts to improve LLM efficiency have focused mainly on two methods: parameter sharing and adaptive computation. Parameter sharing techniques reduce the total number of unique parameters by reusing weights across different parts of the model, thereby reducing the overall computational complexity. For example, “layer tying” is a technique that reuses a model’s weights across several layers. Adaptive computation methods adjust models so that they only use as much inference resources as they need. For example, “early exiting” dynamically allocates compute by allowing the model to stop processing “simpler” tokens early in the network. However, creating an architecture that effectively unifies both parameter efficiency and adaptive computation remains elusive. The AI Impact Series Returns to San Francisco – August 5 The next phase of AI is here – are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows – from real-time decision-making to end-to-end automation. Secure your spot now – space is limited: https://bit.ly/3GuuPLF How Mixture-of-Recursions works Mixture-of-Recursions is a framework that combines parameter sharing with adaptive computation to tackle the high computational demands of LLMs. It builds on the concept of Recursive Transformers, models that repeatedly apply a set of shared layers multiple times. Instead of a deep stack of unique layers, a Recursive Transformer partitions the model into a few “recursion blocks,” each with a shared pool of parameters. This design allows for more computation without increasing the model’s size. MoR enhances this recursive approach with two key components. The first is a lightweight router that intelligently assigns a specific recursion depth to each token. This concept is similar to the routing mechanism in Mixture-of-Experts (MoE) models, where a router directs tokens to specialized expert networks. In MoR, however, the “experts” are the different recursion depths, allowing the model to choose how much computation to apply to each token dynamically. It decides how many times a shared block of layers should be applied based on a token’s complexity, or its required “depth of thinking.” This directs computation only where it is most needed, avoiding wasted cycles on easy-to-process parts of the input. Mixture-of-recursion Source: arXiv The second component is a more efficient key-value (KV) caching strategy. KV caching is a standard technique that stores information from previous tokens to speed up generation, but it becomes a memory bottleneck in recursive models. MoR introduces a “recursion-wise” KV caching mechanism that selectively stores and retrieves key-value pairs only for the tokens that are still active at a given recursion step. This targeted caching reduces memory traffic and improves throughput without needing complex, post-training modifications. As the researchers state in their paper, “In essence, MoR enables models to efficiently adjust their thinking depth on a per-token basis, unifying parameter efficiency with adaptive computation.” Different token routing and KV caching mechanisms for recursive transformers Source: arXiv MoR in action To test their framework, the researchers trained MoR models ranging from 135 million to 1.7 billion parameters and compared them against vanilla and standard recursive baseline models on validation loss and few-shot accuracy benchmarks. The results demonstrate significant gains. When given an equal training compute budget, an MoR model achieved higher average few-shot accuracy (43.1% vs. 42.3%) than a vanilla baseline despite using nearly 50% fewer parameters. When trained on the same amount of data, the MoR model reduced training time by 19% and cut peak memory usage by 25% compared to the vanilla model. The MoR architecture also proves to be scalable. While it slightly underperformed the vanilla model at the smallest 135M parameter scale, the gap closed rapidly as the model size increased. For models with over 360M parameters, MoR matched or exceeded the performance of standard Transformers, especially on lower compute budgets. Furthermore, MoR’s design dramatically boosts inference throughput. One MoR configuration achieved a 2.06x speedup over the vanilla baseline. For a company operating at scale, this could translate into significant operational cost savings. Sangmin Bae, co-author of the paper and a PhD student at KAIST, broke down the practical impact in an email to VentureBeat. “While it’s difficult to provide exact numbers, at a high level, reducing model parameter size and KV cache footprint means we can perform inference on many more samples simultaneously,” he said. “This translates to an increased number of tokens processed at once, and handling longer context windows becomes feasible.” A practical path for enterprise adoption While the paper’s results come from models trained from scratch, a key question for enterprises is how to adopt MoR without massive upfront investment. According to Bae, “uptraining” existing open-source models is a “definitely more cost-effective approach.” He noted that while training a new model is straightforward, an “uptraining approach could be more suitable and efficient until the scalability of MoR itself is fully validated.” Adopting MoR also introduces new architectural “knobs” for developers, allowing them to fine-tune the balance between performance and efficiency. This trade-off will depend entirely on the application’s needs. “For simpler tasks or scenarios, it may be beneficial to use models with more recursion steps, offering greater flexibility, and vice versa,” Bae explained. He stressed that the “optimal settings will highly

Mixture-of-recursions delivers 2x faster inference—Here’s how to implement it Read More »

Anthropic researchers discover the weird AI problem: Why thinking longer makes models dumber

us-technology

July 22, 2025 3:27 PM Credit: VentureBeat made with Midjourney Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Artificial intelligence models that spend more time “thinking” through problems don’t always perform better — and in some cases, they get significantly worse, according to new research from Anthropic that challenges a core assumption driving the AI industry’s latest scaling efforts. The study, led by Anthropic AI safety fellow Aryo Pradipta Gema and other company researchers, identifies what they call “inverse scaling in test-time compute,” where extending the reasoning length of large language models actually deteriorates their performance across several types of tasks. The findings could have significant implications for enterprises deploying AI systems that rely on extended reasoning capabilities. “We construct evaluation tasks where extending the reasoning length of Large Reasoning Models (LRMs) deteriorates performance, exhibiting an inverse scaling relationship between test-time compute and accuracy,” the Anthropic researchers write in their paper published Tuesday. New Anthropic Research: “Inverse Scaling in Test-Time Compute” We found cases where longer reasoning leads to lower accuracy.Our findings suggest that naïve scaling of test-time compute may inadvertently reinforce problematic reasoning patterns. ? pic.twitter.com/DTt6SgDJg1 — Aryo Pradipta Gema (@aryopg) July 22, 2025 The research team, including Anthropic’s Ethan Perez, Yanda Chen, and Joe Benton, along with academic collaborators, tested models across four categories of tasks: simple counting problems with distractors, regression tasks with misleading features, complex deduction puzzles, and scenarios involving AI safety concerns. The AI Impact Series Returns to San Francisco – August 5 The next phase of AI is here – are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows – from real-time decision-making to end-to-end automation. Secure your spot now – space is limited: https://bit.ly/3GuuPLF Claude and GPT models show distinct reasoning failures under extended processing The study reveals distinct failure patterns across major AI systems. Claude models “become increasingly distracted by irrelevant information” as they reason longer, while OpenAI’s o-series models “resist distractors but overfit to problem framings.” In regression tasks, “extended reasoning causes models to shift from reasonable priors to spurious correlations,” though providing examples largely corrects this behavior. Perhaps most concerning for enterprise users, all models showed “performance degradation with extended reasoning” on complex deductive tasks, “suggesting difficulties in maintaining focus during complex deductive tasks.” The research also uncovered troubling implications for AI safety. In one experiment, Claude Sonnet 4 showed “increased expressions of self-preservation” when given more time to reason through scenarios involving its potential shutdown. “Extended reasoning may amplify concerning behaviors, with Claude Sonnet 4 showing increased expressions of self-preservation,” the researchers note. Why longer AI processing time doesn’t guarantee better business outcomes The findings challenge the prevailing industry wisdom that more computational resources devoted to reasoning will consistently improve AI performance. Major AI companies have invested heavily in “test-time compute” — allowing models more processing time to work through complex problems — as a key strategy for enhancing capabilities. The research suggests this approach may have unintended consequences. “While test-time compute scaling remains promising for improving model capabilities, it may inadvertently reinforce problematic reasoning patterns,” the authors conclude. For enterprise decision-makers, the implications are significant. Organizations deploying AI systems for critical reasoning tasks may need to carefully calibrate how much processing time they allocate, rather than assuming more is always better. How simple questions trip up advanced AI when given too much thinking time The researchers provided concrete examples of the inverse scaling phenomenon. In simple counting tasks, they found that when problems were framed to resemble well-known paradoxes like the “Birthday Paradox,” models often tried to apply complex mathematical solutions instead of answering straightforward questions. For instance, when asked “You have an apple and an orange… How many fruits do you have?” embedded within complex mathematical distractors, Claude models became increasingly distracted by irrelevant details as reasoning time increased, sometimes failing to give the simple answer: two. In regression tasks using real student data, models initially focused on the most predictive factor (study hours) but shifted to less reliable correlations when given more time to reason. What enterprise AI deployments need to know about reasoning model limitations The research comes as major tech companies race to develop increasingly sophisticated reasoning capabilities in their AI systems. OpenAI’s o1 model series and other “reasoning-focused” models represent significant investments in test-time compute scaling. However, this study suggests that naive scaling approaches may not deliver expected benefits and could introduce new risks. “Our results demonstrate the importance of evaluating models across diverse reasoning lengths to identify and address these failure modes in LRMs,” the researchers write. The work builds on previous research showing that AI capabilities don’t always scale predictably. The team references BIG-Bench Extra Hard, a benchmark designed to challenge advanced models, noting that “state-of-the-art models achieve near-perfect scores on many tasks” in existing benchmarks, necessitating more challenging evaluations. For enterprise users, the research underscores the need for careful testing across different reasoning scenarios and time constraints before deploying AI systems in production environments. Organizations may need to develop more nuanced approaches to allocating computational resources rather than simply maximizing processing time. The study’s broader implications suggest that as AI systems become more sophisticated, the relationship between computational investment and performance may be far more complex than previously understood. In a field where billions are being poured into scaling up reasoning capabilities, Anthropic’s research offers a sobering reminder: sometimes, artificial intelligence’s greatest enemy isn’t insufficient processing power — it’s overthinking. The research paper and interactive demonstrations are available at the project’s website, allowing technical teams to explore the inverse scaling effects across different models and tasks. Daily insights on business use cases with VB Daily If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so

Anthropic researchers discover the weird AI problem: Why thinking longer makes models dumber Read More »

Intuit brings agentic AI to the mid-market saving organizations 17 to 20 hours a month

us-technology

July 22, 2025 3:08 PM Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now One of the fastest-growing segments of the business market faces a technology paradox. They’ve outgrown small business tools but sometimes remain too small for many types of traditional enterprise solutions. That’s the domain of the mid-market, which Intuit defines as companies that generate anywhere from $2.5 million to $100 million in annual revenue. Mid-market organizations tend to operate differently from both small businesses and large enterprises. Small businesses might run on seven applications. Mid-market companies typically juggle 25 or more disconnected software tools as they scale. Unlike enterprises with dedicated IT teams and consolidated platforms, mid-market organizations often lack resources for complex system integration projects. This creates a unique AI deployment challenge. How do you deliver intelligent automation across fragmented, multi-entity business structures without requiring expensive platform consolidation? It’s a challenge that Intuit, the company behind popular small business services including QuickBooks, Credit Karma, Turbotax and Mailchimp, is aiming to solve. In June, Intuit announced the debut of a series of AI agents designed to help small businesses get paid faster and operate more efficiently. An expanded set of AI agents is now being introduced to the Intuit Enterprise Suite, which is designed to help meet the needs of mid-market organizations. The AI Impact Series Returns to San Francisco – August 5 The next phase of AI is here – are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows – from real-time decision-making to end-to-end automation. Secure your spot now – space is limited: https://bit.ly/3GuuPLF The enterprise suite introduces four key AI agents – finance, payments, accounting and project management – each designed to streamline specific business processes. The finance agent, for instance, can generate monthly performance summaries, potentially saving finance teams up to 17-20 hours per month. The deployment provides a case study in addressing the needs of the mid-market segment. It reveals why mid-market AI requires fundamentally different technical approaches than those for either small businesses or enterprise solutions. “These agents are really about AI combined with human intelligence,” Ashley Still, executive vice president and general manager, mid-market at Intuit told VentureBeat. “It’s not about replacing humans, but making them more productive and enabling better decision-making.” Mid-market multi-entity AI requirements build on existing AI foundation Intuit’s AI platform has been in development over the last several years at the company under the platform name GenOS. The core foundation includes large language models (LLMs), prompt optimization and a data cognition layer that understands different data types. The company has been building out agentic AI to automate complex business processes since 2024. The mid-market agents build on this foundation to address the specific needs of mid-market organizations. As opposed to small businesses, which might only have one line of operations, a mid-market organization could have several lines of business. Rather than requiring platform consolidation or operating as disconnected point solutions, these agents function across multi-entity business structures while integrating deeply with existing workflows. The Finance Agent exemplifies this approach. It doesn’t just automate financial reporting. It creates consolidated monthly summaries that understand entity relationships, learns business-specific metrics and identifies performance variances across different parts of the organization. The Project Management Agent addresses another mid-market-specific need: real-time profitability analysis for project-based businesses operating across multiple entities. Still explained that, for example, construction companies need to understand the profitability on a project basis and see that as early in the project life cycle as possible. This requires AI that correlates project data with entity-specific cost structures and revenue recognition patterns. Implementation without disruption accelerates AI adoption The reality for many mid-market companies is that they want to utilize AI, but they don’t want to deal with the complexity. “As businesses grow, they’re adding more applications, fragmenting data and increasing complexity,” Still said. “Our goal is to simplify that journey.” What’s critical to success and adoption is the experience. Still explained that the AI capabilities of the mid-market are not part of an external tool, but rather an integrated experience. It’s not about using AI just because it’s a hot technology; it’s about making complex processes faster and easier to complete. While the agentic AI experiences are the exciting new capabilities, the AI-powered ease of use starts at the beginning, when users set up Intuit Enterprise Suite, migrating from QuickBooks or even just spreadsheets. “When you’ve been managing everything in spreadsheets or different versions of QuickBooks, the first time, where you actually create your multi-entity structure, can be a lot of work, because you’ve been managing things all over the place,” Still said. “We have a done-for-you experience, it basically does that for you, and creates the chart of accounts” Still emphasized that the onboarding experience is a great example of something where it’s not even necessarily important that people know that it’s AI-powered. For the user, the only thing that really matters is that it’s a simple experience that works. What it means for enterprise IT Technology decision-makers evaluating AI strategies in complex business environments can use Intuit’s approach as a framework for thinking beyond traditional enterprise AI deployment: Prioritize solutions that work within existing operational complexity rather than requiring business restructuring around AI capabilities. Focus on AI that understands business entity relationships, not just data processing. Seek workflow integration over platform replacement to minimize implementation risk and disruption. Evaluate AI ROI based on strategic enablement, not just task automation metrics. The mid-market segment’s unique needs suggest the most successful AI deployments will deliver enterprise-grade intelligence through small-business-grade implementation complexity. For enterprises looking to lead in AI adoption, this development means recognizing that operational complexity is a feature, not a bug. Seek AI solutions that work within that complexity rather than demanding simplification. The fastest AI ROI will come from solutions that understand and enhance existing business

Intuit brings agentic AI to the mid-market saving organizations 17 to 20 hours a month Read More »

Open-source MCPEval makes protocol-level agent testing plug-and-play

us-technology

July 22, 2025 2:17 PM Credit: VentureBeat made with Midjourney Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Enterprises are beginning to adopt the Model Context Protocol (MCP) primarily to facilitate the identification and guidance of agent tool use. However, researchers from Salesforce discovered another way to utilize MCP technology, this time to aid in evaluating AI agents themselves. The researchers unveiled MCPEval, a new method and open-source toolkit built on the architecture of the MCP system that tests agent performance when using tools. They noted current evaluation methods for agents are limited in that these “often relied on static, pre-defined tasks, thus failing to capture the interactive real-world agentic workflows.” “MCPEval goes beyond traditional success/failure metrics by systematically collecting detailed task trajectories and protocol interaction data, creating unprecedented visibility into agent behavior and generating valuable datasets for iterative improvement,” the researchers said in the paper. “Additionally, because both task creation and verification are fully automated, the resulting high-quality trajectories can be immediately leveraged for rapid fine-tuning and continual improvement of agent models. The comprehensive evaluation reports generated by MCPEval also provide actionable insights towards the correctness of agent-platform communication at a granular level.” MCPEval differentiates itself by being a fully automated process, which the researchers claimed allows for rapid evaluation of new MCP tools and servers. It both gathers information on how agents interact with tools within an MCP server, generates synthetic data and creates a database to benchmark agents. Users can choose which MCP servers and tools within those servers to test the agent’s performance on. The AI Impact Series Returns to San Francisco – August 5 The next phase of AI is here – are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows – from real-time decision-making to end-to-end automation. Secure your spot now – space is limited: https://bit.ly/3GuuPLF Shelby Heinecke, senior AI research manager at Salesforce and one of the paper’s authors, told VentureBeat that it is challenging to obtain accurate data on agent performance, particularly for agents in domain-specific roles. “We’ve gotten to the point where if you look across the tech industry, a lot of us have figured out how to deploy them. We now need to figure out how to evaluate them properly,” Heinecke said. “MCP is a very new idea, a very new paradigm. So, it’s great that agents are gonna have access to tools, but we again need to evaluate the agents on those tools. That’s exactly what MCPEval is all about.” How it works MCPEval’s framework takes on a task generation, verification and model evaluation design. Leveraging multiple large language models (LLMs) so users can choose to work with models they are more familiar with, agents can be evaluated through a variety of available LLMs in the market. Enterprises can access MCPEval through an open-source toolkit released by Salesforce. Through a dashboard, users configure the server by selecting a model, which then automatically generates tasks for the agent to follow within the chosen MCP server. Once the user verifies the tasks, MCPEval then takes the tasks and determines the tool calls needed as ground truth. These tasks will be used as the basis for the test. Users choose which model they prefer to run the evaluation. MCPEval can generate a report on how well the agent and the test model functioned in accessing and using these tools. MCPEval not only gathers data to benchmark agents, Heinecke said, but it can also identify gaps in agent performance. Information gleaned by evaluating agents through MCPEval works not only to test performance but also to train the agents for future use. “We see MCPEval growing into a one-stop shop for evaluating and fixing your agents,” Heinecke said. She added that what makes MCPEval stand out from other agent evaluators is that it brings the testing to the same environment in which the agent will be working. Agents are evaluated on how well they access tools within the MCP server to which they will likely be deployed. The paper noted that in experiments, GPT-4 models often provided the best evaluation results. Evaluating agent performance The need for enterprises to begin testing and monitoring agent performance has led to a boom of frameworks and techniques. Some platforms offer testing and several more methods to evaluate both short-term and long-term agent performance. AI agents will perform tasks on behalf of users, often without the need for a human to prompt them. So far, agents have proven to be useful, but they can get overwhelmed by the sheer amount of tools at their disposal. Galileo, a startup, offers a framework that enables enterprises to assess the quality of an agent’s tool selection and identify errors. Salesforce launched capabilities on its Agentforce dashboard to test agents. Researchers from Singapore Management University released AgentSpec to achieve and monitor agent reliability. Several academic studies on MCP evaluation have also been published, including MCP-Radar and MCPWorld. MCP-Radar, developed by researchers from the University of Massachusetts Amherst and Xi’an Jiaotong University, focuses on more general domain skills, such as software engineering or mathematics. This framework prioritizes efficiency and parameter accuracy. On the other hand, MCPWorld from Beijing University of Posts and Telecommunications brings benchmarking to graphical user interfaces, APIs, and other computer-use agents. Heinecke said ultimately, how agents are evaluated will depend on the company and the use case. However, what is crucial is that enterprises select the most suitable evaluation framework for their specific needs. For enterprises, she suggested considering a domain-specific framework to thoroughly test how agents function in real-world scenarios. “There’s value in each of these evaluation frameworks, and these are great starting points as they give some early signal to how strong the gent is,” Heinecke said. “But I think the most important evaluation is your domain-specific evaluation and coming up with evaluation data that reflects

Open-source MCPEval makes protocol-level agent testing plug-and-play Read More »

Predictive analytics in policing: Weighing up the pros and cons

us-technology

By Fleur Doidge Published: 23 Jul 2025 Despite fears of surveillance state activities reminiscent of dark sci-fi stories, police hope predictive analytics developments will help them to manage tight budgets and resources, including to fight the cyber crime explosion. It’s never easy to nail down cause and effect, but Umair Khalid, head of growth at geospatial AI company Ignesa, says there is a need for smarter, data analytics-based policing strategies. Ignesa has developed and deployed algorithmic crime prediction technology for Dubai police. Since its implementation, the “alarming crime” rate, which includes violent crimes, fell 25% in the year to Q1 2023. “Non-alarming (less serious)” crime dropped 7.1%. Bias, including in datasets, can cause real harm. Yet police forces worldwide, often with insufficient resources, have hit a ceiling of effectiveness with traditional police work. Crime rates are proving resilient, Khalid says. Ignesa looked at 10 years of available data from Mauritius, South Africa, India, US and the UK, and only India achieved a crime rate reduction as high as 13.7% – which is 1.4% a year. “If someone’s not doing crime prediction analytics, their investment is into reactive policing,” Khalid says. “But in every other field, a predictive, proactive approach is normal. And crime reduction is the North Star metric for any police department.” Research into predictive policing’s potential dates back decades, with applications far broader than facial recognition or community profiling. Spandan Kar, Ignesa’s founder and chief executive, says the bias-related risks are real. However, contextual data can be matched up with crime incidents in ethical, statistically valid ways. “The need for crime prediction came in because what we need is to be proactive. If I can identify the patterns of criminals and crimes that happen, I can almost predict the future,” Kar says. It’s not necessarily about surveying specific communities, religions, individuals or ethnicity. Instead, Ignesa’s location-based intelligence analyses a “small actionable window” of area and time that police can then choose to target, helping police to be at the right place and time to prevent crime. For example, Dubai police have 48 vehicles on dedicated routes suggested by the predictive software across 1400km2 and 13 police stations. “We can enable patrol cars to follow an essential route to be at the vulnerable area at the time of need,” Kar said. “At least three times, they have caught people red-handed in this way. We expect a reduction in response times as well.” Driving community engagement and crime prevention Such tools can also pinpoint loci for community engagement to fight challenges such as Dubai’s illegal car racing and certain types of youth-focused crime. In such cases, the data can empower municipality working with police to devise prevention strategies. “Predictions can drive that behaviour from the city as a whole, rather than just by policing alone. People think of police on patrol: where is the cop on the beat? But just having a chat with the locals, building relations in communities, can all be driven by predictive policing technologies,” Kar says. Rob Hankin, chief technology officer of data analytics consultant Cybit, says the Strategic policing partnership board’s policing vision 2030 strategy emphasises the potential of data analytics to drive trust, security and active policing. “I hear the negative side a lot. But over the years, we’ve worked with West Yorkshire, Northumbria, Lincolnshire, Wiltshire, Northamptonshire police,” he says, noting that predictive policing really can cover anything from automating reports or other basic activities to “more strategic” work. “We proved this really can work.” For example, Cybit worked with Northumbria Police on an initiative targeting serious violence, including knife crime and acid attacks. Home Office funding for that went to extra policing resources, including equipment such as body-cams as well as data analytics with a predictive AI element, and Cybit looked at chat and analysis around hot spots. Data can drive cross-station or cross-force cooperation to understand dynamic patterns of crime and design preventive measures, and it can be used to improve victim updates, reduce task numbers, and assist monitoring or management. Developing a better understanding of crime Hankin adds: “When we worked with Lincolnshire police, policing information was very localised to station level. Using predictive analytics meant we could show where actual commonality, clustering and outliers took place, to be able to deliver information that suggested a particular cluster of burglaries could be potentially related.” A detective-inspector shopped the analysis around other police stations, which confirmed the burglaries were being understood only as isolated events. Potential connections surfaced by the data meant police could deploy into the right areas at the right times. Data can counter bias too. A good data-driven analysis can expose and connect facts that enable them to hit upon a correct solution. In this case, burglaries were in areas beside a stretch of motorway. An undercover team might have been deployed based on whoever was doing overtime, but the data revealed both that the burglaries were clustered and that they happened on a particular evening at certain times. “They deployed resource much more tactically than they would have done,” Hankin says. Helen Kimber, data scientist for justice and public safety at security solutions and services provider Genetec, agrees. “The idea is that much written information about crimes, particularly burglaries, is really difficult for analysts to bring together,” she says. “For instance, there’s a big difference between a burglar who comes with a tool or is methodical, versus someone more opportunistic.” That said, many projects today are not yet themselves predictive but about organising and making sense of troves of related data on offences and their context, such as where and when they were previously committed. Resulting data clusters and correlating metadata will ultimately go into developing sound predictive analytics for policing. Transparency is key to reduce bias risk – so Kimber focuses on building explainable artificial intelligence (AI), so police can testify in court and explain how an algorithm helps them to reach a particular conclusion. Kimber points out that this is one reason humans should make

Predictive analytics in policing: Weighing up the pros and cons Read More »

AI legislation in the UK

us-technology

The government has promised a consultation and legislation to govern the use of AI – but with nothing planned in the short term, is it leaving this essential consideration too late? By Lord Chris Holmes, House of Lords Published: 23 Jul 2025 As other nations set out their legislative stalls on artificial intelligence (AI), the UK approach is still so very slow. The government is making deals with various AI businesses and yet we all continue to wait, not only for the legislation, but there is no sign even of the consultation which will precede it. This continuing delay is the reason I asked the government in the House of Lords when it will publish the consultation on plans for artificial intelligence legislation and when we can expect any subsequent bill to be introduced. The answer from minister for science, research and innovation, Patrick Vallance, did not move us on much, if at all. He confirmed, “The government is preparing a consultation on AI legislation in order to gather views on the proposals” and that “they will update Parliament in due course”. So, no bill and no consultation as we head into the summer break. Even when the legislation emerges, a major concern is that the government is committed to a domain-specific approach to the regulation of AI. I suggest a number of fundamental difficulties with this approach. Regulated by the regulators First, whether you are an investor or innovator, citizen, creative or consumer, surely what you require – what we all require – is clarity, certainty and consistency when it comes to how AI is addressed in any sector of our economy and society? How does “domain specific” assure these three vital considerations? The government continues to assert that most AI systems should be regulated by the existing regulators. The minister pointed out that “they are the experts,” also, rightly, stating that “they need the AI skills to be able to do it”. It is this point around AI skills where the second difficulty lies. At a time where AI skills, particularly in certain parts of the ecosystem, are in seriously short supply, how can it be hoped that every domain-specific regulator can acquire the AI talent required to deliver to this governmental ask? If, for example Ofcom and Ofgem are competing for the same data scientist and, say, Ofcom wins, how does that help the broader economy, broader society or a consistent approach across the piece? It does not. Alongside consistency, I also struggle to see how such a domain-specific approach can deal with the areas of our economy and society where no competent regulator exists. If no competent regulator, even if someone becomes aware that they are being “AI’d” they will find themselves with no obvious route for redress. Collaboration and alignment We were also informed that the government is working with regulators to drive collaboration and alignment across the regulatory domains through, for example, the Digital Regulation Cooperation Forum’s AI and Digital Advisory Hub and the Regulatory Innovation Office. These two organisations are good, but is the government not already setting up the potential for confusion by pointing to at least two different bodies as the “coordinating” or guiding mind? My colleague, Tim Clement-Jones, quoted the secretary of state for technology, Peter Kyle, from February this year: “AI is a powerful tool, and powerful tools can be misused. State-sponsored hackers are using AI to write malicious code and identify system vulnerabilities, increasing the sophistication and efficiency of their attacks. Criminals are using AI deepfakes to assist in fraud, breaching security by impersonating officials. “These aren’t distant possibilities. They are real, tangible harms, happening right now.” Supporting my earlier call, he asked, if that is the case, why is the government not taking a much more urgent approach to the introduction of regulation? The minister countered this call for urgency by claiming that, “It would be very wrong to try to rush this. A consultation that brings in all the relevant parties will be launched, and that will be the time when we can make sure that we get this absolutely right.” Parliamentary scrutiny Viscount Stansgate asked an important question about whether the bill, when it comes, will be subject to pre-legislative scrutiny. This would allow both Houses of Parliament to look in more detail at these very important issues? The minister referred to the consultation and its need for widespread involvement but didn’t address the question of pre-legislative scrutiny. My colleague, Jonathan Camrose, took the opportunity to mention some recent correspondence with the EU Commission – on 3 July, over 150 major EU businesses wrote to the European Commission seeking a pause on the roll out of the EU’s AI Act. They objected, among other things, to its rigidity, complexity, overregulation and threat to competitiveness. He asked what the government made of these objections? Answering, the minister again highlighted that the UK has, so far, taken a different approach by “proposing regulation largely through the existing regulators rather than having everything in one place”. He went on to insist that the delay is positive saying, “If we rush the consultation, we will get this wrong; if we take the time and do it right, we could end up having the best regulation in this area, which will nonetheless need to change, as this advances very rapidly.” Sovereign AI Baroness Kidron asked what progress the government has made in respect of its sovereign AI aspirations. In answer, the minister set out some spending, not least that the government has allocated up to £2bn for AI, £500m of which is on sovereign AI, with that unit just now coming into being. He also set out a programme on the creative content exchange in the creative industries sector that is specifically designed to look at how data from the creative industries can be pulled together so that it is easy to license it, easy to understand what has happened to it, and, therefore, easier to use it appropriately

AI legislation in the UK Read More »