ContentSproute

us-technology

Mixture-of-recursions delivers 2x faster inference—Here’s how to implement it

July 22, 2025 5:05 PM Image credit: VentureBeat with Imagen 4 Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Researchers at KAIST AI and Mila have introduced a new Transformer architecture that makes large language models (LLMs) more memory- and compute-efficient. The architecture, called Mixture-of-Recursions (MoR), significantly improves model accuracy and delivers higher throughput compared with vanilla transformers, even when constrained by the same parameter count and compute budget. The scaling challenges of LLMs The impressive capabilities of today’s LLMs are directly tied to their ever-increasing size. But as these models scale, their memory footprints and computational requirements often become untenable, making both training and deployment challenging for organizations outside of hyperscale data centers. This has led to a search for more efficient designs. Efforts to improve LLM efficiency have focused mainly on two methods: parameter sharing and adaptive computation. Parameter sharing techniques reduce the total number of unique parameters by reusing weights across different parts of the model, thereby reducing the overall computational complexity. For example, “layer tying” is a technique that reuses a model’s weights across several layers. Adaptive computation methods adjust models so that they only use as much inference resources as they need. For example, “early exiting” dynamically allocates compute by allowing the model to stop processing “simpler” tokens early in the network. However, creating an architecture that effectively unifies both parameter efficiency and adaptive computation remains elusive. The AI Impact Series Returns to San Francisco – August 5 The next phase of AI is here – are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows – from real-time decision-making to end-to-end automation. Secure your spot now – space is limited: https://bit.ly/3GuuPLF How Mixture-of-Recursions works Mixture-of-Recursions is a framework that combines parameter sharing with adaptive computation to tackle the high computational demands of LLMs. It builds on the concept of Recursive Transformers, models that repeatedly apply a set of shared layers multiple times. Instead of a deep stack of unique layers, a Recursive Transformer partitions the model into a few “recursion blocks,” each with a shared pool of parameters. This design allows for more computation without increasing the model’s size. MoR enhances this recursive approach with two key components. The first is a lightweight router that intelligently assigns a specific recursion depth to each token. This concept is similar to the routing mechanism in Mixture-of-Experts (MoE) models, where a router directs tokens to specialized expert networks. In MoR, however, the “experts” are the different recursion depths, allowing the model to choose how much computation to apply to each token dynamically. It decides how many times a shared block of layers should be applied based on a token’s complexity, or its required “depth of thinking.” This directs computation only where it is most needed, avoiding wasted cycles on easy-to-process parts of the input. Mixture-of-recursion Source: arXiv The second component is a more efficient key-value (KV) caching strategy. KV caching is a standard technique that stores information from previous tokens to speed up generation, but it becomes a memory bottleneck in recursive models. MoR introduces a “recursion-wise” KV caching mechanism that selectively stores and retrieves key-value pairs only for the tokens that are still active at a given recursion step. This targeted caching reduces memory traffic and improves throughput without needing complex, post-training modifications. As the researchers state in their paper, “In essence, MoR enables models to efficiently adjust their thinking depth on a per-token basis, unifying parameter efficiency with adaptive computation.” Different token routing and KV caching mechanisms for recursive transformers Source: arXiv MoR in action To test their framework, the researchers trained MoR models ranging from 135 million to 1.7 billion parameters and compared them against vanilla and standard recursive baseline models on validation loss and few-shot accuracy benchmarks. The results demonstrate significant gains. When given an equal training compute budget, an MoR model achieved higher average few-shot accuracy (43.1% vs. 42.3%) than a vanilla baseline despite using nearly 50% fewer parameters. When trained on the same amount of data, the MoR model reduced training time by 19% and cut peak memory usage by 25% compared to the vanilla model. The MoR architecture also proves to be scalable. While it slightly underperformed the vanilla model at the smallest 135M parameter scale, the gap closed rapidly as the model size increased. For models with over 360M parameters, MoR matched or exceeded the performance of standard Transformers, especially on lower compute budgets. Furthermore, MoR’s design dramatically boosts inference throughput. One MoR configuration achieved a 2.06x speedup over the vanilla baseline. For a company operating at scale, this could translate into significant operational cost savings. Sangmin Bae, co-author of the paper and a PhD student at KAIST, broke down the practical impact in an email to VentureBeat. “While it’s difficult to provide exact numbers, at a high level, reducing model parameter size and KV cache footprint means we can perform inference on many more samples simultaneously,” he said. “This translates to an increased number of tokens processed at once, and handling longer context windows becomes feasible.” A practical path for enterprise adoption While the paper’s results come from models trained from scratch, a key question for enterprises is how to adopt MoR without massive upfront investment. According to Bae, “uptraining” existing open-source models is a “definitely more cost-effective approach.” He noted that while training a new model is straightforward, an “uptraining approach could be more suitable and efficient until the scalability of MoR itself is fully validated.” Adopting MoR also introduces new architectural “knobs” for developers, allowing them to fine-tune the balance between performance and efficiency. This trade-off will depend entirely on the application’s needs. “For simpler tasks or scenarios, it may be beneficial to use models with more recursion steps, offering greater flexibility, and vice versa,” Bae explained. He stressed that the “optimal settings will highly

Mixture-of-recursions delivers 2x faster inference—Here’s how to implement it Read More »

Anthropic researchers discover the weird AI problem: Why thinking longer makes models dumber

July 22, 2025 3:27 PM Credit: VentureBeat made with Midjourney Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Artificial intelligence models that spend more time “thinking” through problems don’t always perform better — and in some cases, they get significantly worse, according to new research from Anthropic that challenges a core assumption driving the AI industry’s latest scaling efforts. The study, led by Anthropic AI safety fellow Aryo Pradipta Gema and other company researchers, identifies what they call “inverse scaling in test-time compute,” where extending the reasoning length of large language models actually deteriorates their performance across several types of tasks. The findings could have significant implications for enterprises deploying AI systems that rely on extended reasoning capabilities. “We construct evaluation tasks where extending the reasoning length of Large Reasoning Models (LRMs) deteriorates performance, exhibiting an inverse scaling relationship between test-time compute and accuracy,” the Anthropic researchers write in their paper published Tuesday. New Anthropic Research: “Inverse Scaling in Test-Time Compute” We found cases where longer reasoning leads to lower accuracy.Our findings suggest that naïve scaling of test-time compute may inadvertently reinforce problematic reasoning patterns. ? pic.twitter.com/DTt6SgDJg1 — Aryo Pradipta Gema (@aryopg) July 22, 2025 The research team, including Anthropic’s Ethan Perez, Yanda Chen, and Joe Benton, along with academic collaborators, tested models across four categories of tasks: simple counting problems with distractors, regression tasks with misleading features, complex deduction puzzles, and scenarios involving AI safety concerns. The AI Impact Series Returns to San Francisco – August 5 The next phase of AI is here – are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows – from real-time decision-making to end-to-end automation. Secure your spot now – space is limited: https://bit.ly/3GuuPLF Claude and GPT models show distinct reasoning failures under extended processing The study reveals distinct failure patterns across major AI systems. Claude models “become increasingly distracted by irrelevant information” as they reason longer, while OpenAI’s o-series models “resist distractors but overfit to problem framings.” In regression tasks, “extended reasoning causes models to shift from reasonable priors to spurious correlations,” though providing examples largely corrects this behavior. Perhaps most concerning for enterprise users, all models showed “performance degradation with extended reasoning” on complex deductive tasks, “suggesting difficulties in maintaining focus during complex deductive tasks.” The research also uncovered troubling implications for AI safety. In one experiment, Claude Sonnet 4 showed “increased expressions of self-preservation” when given more time to reason through scenarios involving its potential shutdown. “Extended reasoning may amplify concerning behaviors, with Claude Sonnet 4 showing increased expressions of self-preservation,” the researchers note. Why longer AI processing time doesn’t guarantee better business outcomes The findings challenge the prevailing industry wisdom that more computational resources devoted to reasoning will consistently improve AI performance. Major AI companies have invested heavily in “test-time compute” — allowing models more processing time to work through complex problems — as a key strategy for enhancing capabilities. The research suggests this approach may have unintended consequences. “While test-time compute scaling remains promising for improving model capabilities, it may inadvertently reinforce problematic reasoning patterns,” the authors conclude. For enterprise decision-makers, the implications are significant. Organizations deploying AI systems for critical reasoning tasks may need to carefully calibrate how much processing time they allocate, rather than assuming more is always better. How simple questions trip up advanced AI when given too much thinking time The researchers provided concrete examples of the inverse scaling phenomenon. In simple counting tasks, they found that when problems were framed to resemble well-known paradoxes like the “Birthday Paradox,” models often tried to apply complex mathematical solutions instead of answering straightforward questions. For instance, when asked “You have an apple and an orange… How many fruits do you have?” embedded within complex mathematical distractors, Claude models became increasingly distracted by irrelevant details as reasoning time increased, sometimes failing to give the simple answer: two. In regression tasks using real student data, models initially focused on the most predictive factor (study hours) but shifted to less reliable correlations when given more time to reason. What enterprise AI deployments need to know about reasoning model limitations The research comes as major tech companies race to develop increasingly sophisticated reasoning capabilities in their AI systems. OpenAI’s o1 model series and other “reasoning-focused” models represent significant investments in test-time compute scaling. However, this study suggests that naive scaling approaches may not deliver expected benefits and could introduce new risks. “Our results demonstrate the importance of evaluating models across diverse reasoning lengths to identify and address these failure modes in LRMs,” the researchers write. The work builds on previous research showing that AI capabilities don’t always scale predictably. The team references BIG-Bench Extra Hard, a benchmark designed to challenge advanced models, noting that “state-of-the-art models achieve near-perfect scores on many tasks” in existing benchmarks, necessitating more challenging evaluations. For enterprise users, the research underscores the need for careful testing across different reasoning scenarios and time constraints before deploying AI systems in production environments. Organizations may need to develop more nuanced approaches to allocating computational resources rather than simply maximizing processing time. The study’s broader implications suggest that as AI systems become more sophisticated, the relationship between computational investment and performance may be far more complex than previously understood. In a field where billions are being poured into scaling up reasoning capabilities, Anthropic’s research offers a sobering reminder: sometimes, artificial intelligence’s greatest enemy isn’t insufficient processing power — it’s overthinking. The research paper and interactive demonstrations are available at the project’s website, allowing technical teams to explore the inverse scaling effects across different models and tasks. Daily insights on business use cases with VB Daily If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so

Anthropic researchers discover the weird AI problem: Why thinking longer makes models dumber Read More »

Intuit brings agentic AI to the mid-market saving organizations 17 to 20 hours a month

July 22, 2025 3:08 PM Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now One of the fastest-growing segments of the business market faces a technology paradox. They’ve outgrown small business tools but sometimes remain too small for many types of traditional enterprise solutions. That’s the domain of the mid-market, which Intuit defines as companies that generate anywhere from $2.5 million to $100 million in annual revenue. Mid-market organizations tend to operate differently from both small businesses and large enterprises. Small businesses might run on seven applications. Mid-market companies typically juggle 25 or more disconnected software tools as they scale. Unlike enterprises with dedicated IT teams and consolidated platforms, mid-market organizations often lack resources for complex system integration projects. This creates a unique AI deployment challenge. How do you deliver intelligent automation across fragmented, multi-entity business structures without requiring expensive platform consolidation? It’s a challenge that Intuit, the company behind popular small business services including QuickBooks, Credit Karma, Turbotax and Mailchimp, is aiming to solve. In June, Intuit announced the debut of a series of AI agents designed to help small businesses get paid faster and operate more efficiently. An expanded set of AI agents is now being introduced to the Intuit Enterprise Suite, which is designed to help meet the needs of mid-market organizations. The AI Impact Series Returns to San Francisco – August 5 The next phase of AI is here – are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows – from real-time decision-making to end-to-end automation. Secure your spot now – space is limited: https://bit.ly/3GuuPLF The enterprise suite introduces four key AI agents – finance, payments, accounting and project management – each designed to streamline specific business processes. The finance agent, for instance, can generate monthly performance summaries, potentially saving finance teams up to 17-20 hours per month. The deployment provides a case study in addressing the needs of the mid-market segment. It reveals why mid-market AI requires fundamentally different technical approaches than those for either small businesses or enterprise solutions.  “These agents are really about AI combined with human intelligence,” Ashley Still, executive vice president and general manager, mid-market at Intuit told VentureBeat. “It’s not about replacing humans, but making them more productive and enabling better decision-making.” Mid-market multi-entity AI requirements build on existing AI foundation Intuit’s AI platform has been in development over the last several years at the company under the platform name GenOS.   The core foundation includes large language models (LLMs), prompt optimization and a data cognition layer that understands different data types. The company has been building out agentic AI to automate complex business processes since 2024. The mid-market agents build on this foundation to address the specific needs of mid-market organizations. As opposed to small businesses, which might only have one line of operations, a mid-market organization could have several lines of business. Rather than requiring platform consolidation or operating as disconnected point solutions, these agents function across multi-entity business structures while integrating deeply with existing workflows. The Finance Agent exemplifies this approach. It doesn’t just automate financial reporting. It creates consolidated monthly summaries that understand entity relationships, learns business-specific metrics and identifies performance variances across different parts of the organization. The Project Management Agent addresses another mid-market-specific need: real-time profitability analysis for project-based businesses operating across multiple entities. Still explained that, for example, construction companies need to understand the profitability on a project basis and see that as early in the project life cycle as possible. This requires AI that correlates project data with entity-specific cost structures and revenue recognition patterns. Implementation without disruption accelerates AI adoption  The reality for many mid-market companies is that they want to utilize AI, but they don’t want to deal with the complexity. “As businesses grow, they’re adding more applications, fragmenting data and increasing complexity,” Still said. “Our goal is to simplify that journey.” What’s critical to success and adoption is the experience. Still explained that the AI capabilities of the mid-market are not part of an external tool, but rather an integrated experience. It’s not about using AI just because it’s a hot technology; it’s about making complex processes faster and easier to complete. While the agentic AI experiences are the exciting new capabilities, the AI-powered ease of use starts at the beginning, when users set up Intuit Enterprise Suite, migrating from QuickBooks or even just spreadsheets. “When you’ve been managing everything in spreadsheets or different versions of QuickBooks, the first time, where you actually create your multi-entity structure, can be a lot of work, because you’ve been managing things all over the place,” Still said. “We have a done-for-you experience, it basically does that for you, and creates the chart of accounts” Still emphasized that the onboarding experience is a great example of something where it’s not even necessarily important that people know that it’s AI-powered. For the user, the only thing that really matters is that it’s a simple experience that works. What it means for enterprise IT  Technology decision-makers evaluating AI strategies in complex business environments can use Intuit’s approach as a framework for thinking beyond traditional enterprise AI deployment: Prioritize solutions that work within existing operational complexity rather than requiring business restructuring around AI capabilities. Focus on AI that understands business entity relationships, not just data processing. Seek workflow integration over platform replacement to minimize implementation risk and disruption. Evaluate AI ROI based on strategic enablement, not just task automation metrics. The mid-market segment’s unique needs suggest the most successful AI deployments will deliver enterprise-grade intelligence through small-business-grade implementation complexity. For enterprises looking to lead in AI adoption, this development means recognizing that operational complexity is a feature, not a bug. Seek AI solutions that work within that complexity rather than demanding simplification. The fastest AI ROI will come from solutions that understand and enhance existing business

Intuit brings agentic AI to the mid-market saving organizations 17 to 20 hours a month Read More »

Open-source MCPEval makes protocol-level agent testing plug-and-play

July 22, 2025 2:17 PM Credit: VentureBeat made with Midjourney Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Enterprises are beginning to adopt the Model Context Protocol (MCP) primarily to facilitate the identification and guidance of agent tool use. However, researchers from Salesforce discovered another way to utilize MCP technology, this time to aid in evaluating AI agents themselves.  The researchers unveiled MCPEval, a new method and open-source toolkit built on the architecture of the MCP system that tests agent performance when using tools. They noted current evaluation methods for agents are limited in that these “often relied on static, pre-defined tasks, thus failing to capture the interactive real-world agentic workflows.” “MCPEval goes beyond traditional success/failure metrics by systematically collecting detailed task trajectories and protocol interaction data, creating unprecedented visibility into agent behavior and generating valuable datasets for iterative improvement,” the researchers said in the paper. “Additionally, because both task creation and verification are fully automated, the resulting high-quality trajectories can be immediately leveraged for rapid fine-tuning and continual improvement of agent models. The comprehensive evaluation reports generated by MCPEval also provide actionable insights towards the correctness of agent-platform communication at a granular level.” MCPEval differentiates itself by being a fully automated process, which the researchers claimed allows for rapid evaluation of new MCP tools and servers. It both gathers information on how agents interact with tools within an MCP server, generates synthetic data and creates a database to benchmark agents. Users can choose which MCP servers and tools within those servers to test the agent’s performance on.  The AI Impact Series Returns to San Francisco – August 5 The next phase of AI is here – are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows – from real-time decision-making to end-to-end automation. Secure your spot now – space is limited: https://bit.ly/3GuuPLF Shelby Heinecke, senior AI research manager at Salesforce and one of the paper’s authors, told VentureBeat that it is challenging to obtain accurate data on agent performance, particularly for agents in domain-specific roles.  “We’ve gotten to the point where if you look across the tech industry, a lot of us have figured out how to deploy them. We now need to figure out how to evaluate them properly,” Heinecke said. “MCP is a very new idea, a very new paradigm. So, it’s great that agents are gonna have access to tools, but we again need to evaluate the agents on those tools. That’s exactly what MCPEval is all about.” How it works MCPEval’s framework takes on a task generation, verification and model evaluation design. Leveraging multiple large language models (LLMs) so users can choose to work with models they are more familiar with, agents can be evaluated through a variety of available LLMs in the market.  Enterprises can access MCPEval through an open-source toolkit released by Salesforce. Through a dashboard, users configure the server by selecting a model, which then automatically generates tasks for the agent to follow within the chosen MCP server.  Once the user verifies the tasks, MCPEval then takes the tasks and determines the tool calls needed as ground truth. These tasks will be used as the basis for the test. Users choose which model they prefer to run the evaluation. MCPEval can generate a report on how well the agent and the test model functioned in accessing and using these tools.  MCPEval not only gathers data to benchmark agents, Heinecke said, but it can also identify gaps in agent performance. Information gleaned by evaluating agents through MCPEval works not only to test performance but also to train the agents for future use.  “We see MCPEval growing into a one-stop shop for evaluating and fixing your agents,” Heinecke said.  She added that what makes MCPEval stand out from other agent evaluators is that it brings the testing to the same environment in which the agent will be working. Agents are evaluated on how well they access tools within the MCP server to which they will likely be deployed.  The paper noted that in experiments, GPT-4 models often provided the best evaluation results.  Evaluating agent performance The need for enterprises to begin testing and monitoring agent performance has led to a boom of frameworks and techniques. Some platforms offer testing and several more methods to evaluate both short-term and long-term agent performance.  AI agents will perform tasks on behalf of users, often without the need for a human to prompt them. So far, agents have proven to be useful, but they can get overwhelmed by the sheer amount of tools at their disposal.   Galileo, a startup, offers a framework that enables enterprises to assess the quality of an agent’s tool selection and identify errors. Salesforce launched capabilities on its Agentforce dashboard to test agents. Researchers from Singapore Management University released AgentSpec to achieve and monitor agent reliability. Several academic studies on MCP evaluation have also been published, including MCP-Radar and MCPWorld. MCP-Radar, developed by researchers from the University of Massachusetts Amherst and Xi’an Jiaotong University, focuses on more general domain skills, such as software engineering or mathematics. This framework prioritizes efficiency and parameter accuracy.  On the other hand, MCPWorld from Beijing University of Posts and Telecommunications brings benchmarking to graphical user interfaces, APIs, and other computer-use agents. Heinecke said ultimately, how agents are evaluated will depend on the company and the use case. However, what is crucial is that enterprises select the most suitable evaluation framework for their specific needs. For enterprises, she suggested considering a domain-specific framework to thoroughly test how agents function in real-world scenarios. “There’s value in each of these evaluation frameworks, and these are great starting points as they give some early signal to how strong the gent is,” Heinecke said. “But I think the most important evaluation is your domain-specific evaluation and coming up with evaluation data that reflects

Open-source MCPEval makes protocol-level agent testing plug-and-play Read More »

Predictive analytics in policing: Weighing up the pros and cons

By Fleur Doidge Published: 23 Jul 2025 Despite fears of surveillance state activities reminiscent of dark sci-fi stories, police hope predictive analytics developments will help them to manage tight budgets and resources, including to fight the cyber crime explosion. It’s never easy to nail down cause and effect, but Umair Khalid, head of growth at geospatial AI company Ignesa, says there is a need for smarter, data analytics-based policing strategies. Ignesa has developed and deployed algorithmic crime prediction technology for Dubai police. Since its implementation, the “alarming crime” rate, which includes violent crimes, fell 25% in the year to Q1 2023. “Non-alarming (less serious)”  crime dropped 7.1%. Bias, including in datasets, can cause real harm. Yet police forces worldwide, often with insufficient resources, have hit a ceiling of effectiveness with traditional police work. Crime rates are proving resilient, Khalid says. Ignesa looked at 10 years of available data from Mauritius, South Africa, India, US and the UK, and only India achieved a crime rate reduction as high as 13.7% – which is 1.4% a year. “If someone’s not doing crime prediction analytics, their investment is into reactive policing,” Khalid says. “But in every other field, a predictive, proactive approach is normal. And crime reduction is the North Star metric for any police department.” Research into predictive policing’s potential dates back decades, with applications far broader than facial recognition or community profiling. Spandan Kar, Ignesa’s founder and chief executive, says the bias-related risks are real. However, contextual data can be matched up with crime incidents in ethical, statistically valid ways. “The need for crime prediction came in because what we need is to be proactive. If I can identify the patterns of criminals and crimes that happen, I can almost predict the future,” Kar says. It’s not necessarily about surveying specific communities, religions, individuals or ethnicity. Instead, Ignesa’s location-based intelligence analyses a “small actionable window” of area and time that police can then choose to target, helping police to be at the right place and time to prevent crime. For example, Dubai police have 48 vehicles on dedicated routes suggested by the predictive software across 1400km2  and 13 police stations. “We can enable patrol cars to follow an essential route to be at the vulnerable area at the time of need,” Kar said. “At least three times, they have caught people red-handed in this way. We expect a reduction in response times as well.” Driving community engagement and crime prevention Such tools can also pinpoint loci for community engagement to fight challenges such as Dubai’s illegal car racing and certain types of youth-focused crime. In such cases, the data can empower municipality working with police to devise prevention strategies. “Predictions can drive that behaviour from the city as a whole, rather than just by policing alone. People think of police on patrol: where is the cop on the beat? But just having a chat with the locals, building relations in communities, can all be driven by predictive policing technologies,” Kar says. Rob Hankin, chief technology officer of data analytics consultant Cybit, says the Strategic policing partnership board’s policing vision 2030 strategy emphasises the potential of data analytics to drive trust, security and active policing. “I hear the negative side a lot. But over the years, we’ve worked with West Yorkshire, Northumbria, Lincolnshire, Wiltshire, Northamptonshire police,” he says, noting that predictive policing really can cover anything from automating reports or other basic activities to “more strategic” work. “We proved this really can work.” For example, Cybit worked with Northumbria Police on an initiative targeting serious violence, including knife crime and acid attacks. Home Office funding for that went to extra policing resources, including equipment such as body-cams as well as data analytics with a predictive AI element, and Cybit looked at chat and analysis around hot spots. Data can drive cross-station or cross-force cooperation to understand dynamic patterns of crime and design preventive measures, and it can be used to improve victim updates, reduce task numbers, and assist monitoring or management. Developing a better understanding of crime Hankin adds: “When we worked with Lincolnshire police, policing information was very localised to station level. Using predictive analytics meant we could show where actual commonality, clustering and outliers took place, to be able to deliver information that suggested a particular cluster of burglaries could be potentially related.” A detective-inspector shopped the analysis around other police stations, which confirmed the burglaries were being understood only as isolated events. Potential connections surfaced by the data meant police could deploy into the right areas at the right times. Data can counter bias too. A good data-driven analysis can expose and connect facts that enable them to hit upon a correct solution. In this case, burglaries were in areas beside a stretch of motorway. An undercover team might have been deployed based on whoever was doing overtime, but the data revealed both that the burglaries were clustered and that they happened on a particular evening at certain times. “They deployed resource much more tactically than they would have done,” Hankin says. Helen Kimber, data scientist for justice and public safety at security solutions and services provider Genetec, agrees. “The idea is that much written information about crimes, particularly burglaries, is really difficult for analysts to bring together,” she says. “For instance, there’s a big difference between a burglar who comes with a tool or is methodical, versus someone more opportunistic.” That said, many projects today are not yet themselves predictive but about organising and making sense of troves of related data on offences and their context, such as where and when they were previously committed. Resulting data clusters and correlating metadata will ultimately go into developing sound predictive analytics for policing. Transparency is key to reduce bias risk – so Kimber focuses on building explainable artificial intelligence (AI), so police can testify in court and explain how an algorithm helps them to reach a particular conclusion. Kimber points out that this is one reason humans should make

Predictive analytics in policing: Weighing up the pros and cons Read More »

AI legislation in the UK

The government has promised a consultation and legislation to govern the use of AI – but with nothing planned in the short term, is it leaving this essential consideration too late? By Lord Chris Holmes, House of Lords Published: 23 Jul 2025 As other nations set out their legislative stalls on artificial intelligence (AI), the UK approach is still so very slow. The government is making deals with various AI businesses and yet we all continue to wait, not only for the legislation, but there is no sign even of the consultation which will precede it. This continuing delay is the reason I asked the government in the House of Lords when it will publish the consultation on plans for artificial intelligence legislation and when we can expect any subsequent bill to be introduced. The answer from minister for science, research and innovation, Patrick Vallance, did not move us on much, if at all. He confirmed, “The government is preparing a consultation on AI legislation in order to gather views on the proposals” and that “they will update Parliament in due course”. So, no bill and no consultation as we head into the summer break.  Even when the legislation emerges, a major concern is that the government is committed to a domain-specific approach to the regulation of AI. I suggest a number of fundamental difficulties with this approach.  Regulated by the regulators First, whether you are an investor or innovator, citizen, creative or consumer, surely what you require – what we all require – is clarity, certainty and consistency when it comes to how AI is addressed in any sector of our economy and society? How does “domain specific” assure these three vital considerations?  The government continues to assert that most AI systems should be regulated by the existing regulators. The minister pointed out that “they are the experts,” also, rightly, stating that “they need the AI skills to be able to do it”. It is this point around AI skills where the second difficulty lies. At a time where AI skills, particularly in certain parts of the ecosystem, are in seriously short supply, how can it be hoped that every domain-specific regulator can acquire the AI talent required to deliver to this governmental ask?  If, for example Ofcom and Ofgem are competing for the same data scientist and, say, Ofcom wins, how does that help the broader economy, broader society or a consistent approach across the piece? It does not. Alongside consistency, I also struggle to see how such a domain-specific approach can deal with the areas of our economy and society where no competent regulator exists. If no competent regulator, even if someone becomes aware that they are being “AI’d” they will find themselves with no obvious route for redress. Collaboration and alignment We were also informed that the government is working with regulators to drive collaboration and alignment across the regulatory domains through, for example, the Digital Regulation Cooperation Forum’s AI and Digital Advisory Hub and the Regulatory Innovation Office.  These two organisations are good, but is the government not already setting up the potential for confusion by pointing to at least two different bodies as the “coordinating” or guiding mind? My colleague, Tim Clement-Jones, quoted the secretary of state for technology, Peter Kyle, from February this year: “AI is a powerful tool, and powerful tools can be misused. State-sponsored hackers are using AI to write malicious code and identify system vulnerabilities, increasing the sophistication and efficiency of their attacks. Criminals are using AI deepfakes to assist in fraud, breaching security by impersonating officials. “These aren’t distant possibilities. They are real, tangible harms, happening right now.” Supporting my earlier call, he asked, if that is the case, why is the government not taking a much more urgent approach to the introduction of regulation? The minister countered this call for urgency by claiming that, “It would be very wrong to try to rush this. A consultation that brings in all the relevant parties will be launched, and that will be the time when we can make sure that we get this absolutely right.” Parliamentary scrutiny Viscount Stansgate asked an important question about whether the bill, when it comes, will be subject to pre-legislative scrutiny. This would allow both Houses of Parliament to look in more detail at these very important issues? The minister referred to the consultation and its need for widespread involvement but didn’t address the question of pre-legislative scrutiny. My colleague, Jonathan Camrose, took the opportunity to mention some recent correspondence with the EU Commission – on 3 July, over 150 major EU businesses wrote to the European Commission seeking a pause on the roll out of the EU’s AI Act. They objected, among other things, to its rigidity, complexity, overregulation and threat to competitiveness. He asked what the government made of these objections? Answering, the minister again highlighted that the UK has, so far, taken a different approach by “proposing regulation largely through the existing regulators rather than having everything in one place”. He went on to insist that the delay is positive saying, “If we rush the consultation, we will get this wrong; if we take the time and do it right, we could end up having the best regulation in this area, which will nonetheless need to change, as this advances very rapidly.” Sovereign AI Baroness Kidron asked what progress the government has made in respect of its sovereign AI aspirations. In answer, the minister set out some spending, not least that the government has allocated up to £2bn for AI, £500m of which is on sovereign AI, with that unit just now coming into being.  He also set out a programme on the creative content exchange in the creative industries sector that is specifically designed to look at how data from the creative industries can be pulled together so that it is easy to license it, easy to understand what has happened to it, and, therefore, easier to use it appropriately

AI legislation in the UK Read More »

Judge questions HP’s ‘exaggerated’ Autonomy loss claim

Corgarashu – stock.adobe.com Did HP pay 10% more than it needed to for Mike Lynch’s company? By Cliff Saran, Managing Editor Published: 23 Jul 2025 14:43 HP’s valuation of Autonomy, the company it acquired in 2011 for $11bn and took a $5bn financial hit on in 2012, is “substantially exaggerated”. In 2022, HP successfully argued that Autonomy’s senior management team had inflated the value of the company, which meant HP paid over the odds. However, the latest twist in this ongoing legal saga suggests that the $5bn financial hit HP took may have been its attempt to counter a fall in market capitalisation by devaluing some HP assets. The 22 July court ruling stated that HP is owed almost £700m due to the difference between the acquisition price of Autonomy and the price based on the company’s true financial position. Reuters reported that HP is also entitled to another $47.5m for losses suffered by Autonomy group companies in relation to hardware sales and other transactions. However, Justice Robert Hildyard said: “I consider that HP’s claim was always substantially exaggerated: and I have concluded that there is more than a grain of truth in Dr Lynch’s submission … that when … HP announced that it was writing down the value of Autonomy by $8.8bn and attributed some $5bn to alleged fraud, the figure was not based on detailed analysis. Rather, it was predominantly calibrated by reference to the perceived need to reduce the carrying value of some of HP’s assets to take account of the diminution of HP’s market capitalisation following a fall in HP’s share price.” The judge found the valuation of Autonomy to be £23.00 per share, compared with the acquisition price HP paid, of £25.50 per share, representing a 9.8% reduction. On 19 August 2024, Autonomy’s CEO and co-founder Mike Lynch died when his luxury yacht sank in a storm off the coast of Sicily. In a posthumous statement, Lynch wrote: “Today’s High Court ruling reflects that HP’s original $5bn damages claim was not just a wild overstatement – misleading shareholders – but it was off the mark by 80%. HP acquired Autonomy for $11.6bn and today’s judgment is a view that Autonomy’s actual value was not even 10% below the price HP paid. This result exposes HP’s failure and makes clear that the immense damage to Autonomy was down to HP’s own errors and actions.” In June 2024, a few months before his tragic death, Lynch was cleared of fraud in the US. Following a 12-week trial in San Francisco, the jury cleared Lynch of 15 counts of fraud and conspiracy that had been brought against him by HP, which alleged that he had inflated the value of Autonomy. In his written statement for the UK case, Lynch discussed the difference between the UK and US courts: “An appeal process will be considered later this year. The English civil case included hearsay evidence from the US, and we were never able to question or cross-examine those witnesses. This is in direct contrast to the rights of defendants in the US legal system. When in the US criminal trial we were able to cross-examine the relevant witnesses, a very different story emerged. Why is the English legal system so trusting?” Regarding the $5bn write-down HP took after acquiring Autonomy, and HP’s attempt to boost its market capitalisation, Lynch wrote: “Autonomy was lined up to take a disproportionate hit.” Read more on IT suppliers Body of tech entrepreneur Mike Lynch recovered after freak storm By: Cliff Saran HP loses in US fraud case against Autonomy’s Mike Lynch By: Cliff Saran Idol thoughts By: Cliff Saran British court rules in HP’s favour in Mike Lynch fraud case By: Cliff Saran Read More

Judge questions HP’s ‘exaggerated’ Autonomy loss claim Read More »

Subpostmasters shoulder costs of Fujitsu’s Post Office IT outage

Fujitsu datacentre outage hit subpostmaster sales for two hours, leaving subpostmasters to seek compensation By Karl Flinders, Chief reporter and senior editor EMEA Published: 23 Jul 2025 11:56 Subpostmasters lost hundreds of thousands of pounds in business through lost sales and costs when Fujitsu’s datacentre outage cut them off from the software that runs their businesses. The collapse of the Horizon system on 17 July will also cause an inevitable increase in lost transactions and create accounting shortfalls – something subpostmasters had to cover, or face potential prosecution over, in the past. While Post Office branches are small businesses, collectively they are a huge organisation relying on the same IT system called Horizon, which is at the centre of the Post Office scandal. As Computer Weekly revealed last week, the Fujitsu outage meant Horizon was not available for hours, meaning the entire network of around 11,500 Post Office branches were unable to run their businesses. During the downtime, customers walked out without making or completing purchases, potentially going elsewhere, while partner firms such as Amazon, DPD and Evri may have looked at other options to leave deliveries. Subpostmasters also had to pay staff who were unable to work. If, for example, during the two-hour outage every branch lost £200 in costs and lost business, that is £2.3m in total. The sizes of branches differ greatly through the network and some larger, busier branches would have lost more significant sums. Subpostmasters are calling for compensation and answers from the Post Office. Subpostmasters have also raised concerns of further potential problems, claiming that Fujitsu – which is on its way out of the Post Office contract after a quarter of a century – might not be fully committed. Who covers losses? Richard Trinder, subpostmaster of three branches in Yorkshire and Derbyshire, and member of the Voice of the Subpostmasters campaign group, said that he lost hundreds of pounds in wages during the downtime. He asked: “Will the Post Office and Fujitsu compensate us for this?” He also said business is lost when partners and customers go elsewhere. “For example, if you are offering Amazon delivery collection and you are down, they will go elsewhere and might never come back.” Mark Baker, a former subpostmaster and current CWU postmaster representative, added: “Will customers who are cut off when in the Post Office ever come back? They will probably use a different branch, which means the individual subpostmaster has lost business.” Regarding customers leaving a branch when an outage hits, Calum Greenhow, CEO at the National Federation of Subpostmasters (NFSP), said: “The Post Office always says, ‘These customers will come back’, but this is not the case because we no longer have a monopoly on many of the products. “We know there is a service-level agreement where the Post Office pays Fujitsu for extra work when required, but we would like to know if there is one where Fujitsu has to pay for Horizon outages,” he added. Greenhow said the NFSP has asked the Post Office whether compensation will be given to subpostmasters for loss of earnings and was told the Post Office would investigate it. There is no service-level agreement between the Post Office and subpostmasters in relation to Horizon availability. Lost in transaction Baker at the CWU raised fears over an expected increase in lost transactions and unexplained losses caused during the outage: “The big problem is if the system is cut off while a transaction is in transit, or in the subpostmaster’s stack, it might not be recovered.” He added that a transaction could also be lost because it’s in a queue at the datacentre. The Post Office scandal, widely recognised as one of the biggest miscarriages of justice in UK history, was triggered by subpostmasters being blamed for unexplained accounting shortfalls. “We are going through a very dangerous period until a new system and support is brought in,” warned Baker. He added that the Post Office needs to look at the entire architecture when replacing Horizon. “It needs to look at the front end, back end and all the bits in between. They also need to look at the robustness of the support mechanisms when there is an outage.” Computer Weekly asked the Post Office whether it would compensate subpostmasters for losses incurred during the latest outage but had not received a response by the time this article was published. Computer Weekly also asked whether the Post Office would take any additional measures during the next accounting period to ensure that unexplained losses caused by transaction failures during the outage are identified. It has not yet responded. The Post Office did not confirm whether Fujitsu will face any financial penalties because of the outage and Fujitsu has not confirmed whether it has identified the cause of the outage. Specialist investigation firm Kroll is currently reviewing the integrity of current Horizon system data and the processes used to identify discrepancies. The investigation followed a report by the Post Office scandal public inquiry, published in September 2024, which raised concerns about the current version of the controversial system. Computer Weekly asked the Post Office whether Kroll will include the latest incident as part of its review of the Horizon system, but it did not answer. The Post Office scandal was first exposed by Computer Weekly in 2009, revealing the stories of seven subpostmasters and the problems they suffered due to Horizon accounting software, which led to the most widespread miscarriage of justice in British history (see below timeline of Computer Weekly articles about the scandal since 2009). Read more on IT for retail and logistics Kroll reviewing Post Office Horizon’s current integrity and discrepancy identification By: Karl Flinders Post Office scandal data leak interim compensation offers made By: Karl Flinders Government announcement on Fujitsu talks add ‘vague words’ and no interim payment By: Karl Flinders Metropolitan Police concern puts brakes on Post Office Horizon data migration By: Karl Flinders Read More

Subpostmasters shoulder costs of Fujitsu’s Post Office IT outage Read More »

Interview: Is there an easier way to refactor applications?

We speak to the inventor of OpenRewrite about how enterprise IT can manage code across thousands of source code repros By Cliff Saran, Managing Editor Published: 23 Jul 2025 11:45 Looking at a typical Java migration, Jonathan Schneider, CEO and co-founder of Moderne, believes the approach organisations tend to take is unsustainable.  Recalling a conversation with a major bank that needed to migrate to at least Java 17 to fix a particular vulnerability, he says: “The bank was pinned to Java 8 because it was using WebSphere.” Unless the bank moved applications from the WebSphere Java application server to the Tomcat alternative and upgraded to Java 17, it would not be able to resolve this particular Java vulnerability, adds Schneider. The challenge, he says, “is how to refactor 3,000 applications onto a more modern Java environment in a way that avoids breaking them”. Application modernisation is a major headache for IT departments, leading to a drain on resources and a greater cyber security risk, due to older, unpatched code containing known vulnerabilities. A recent report from analyst Forrester highlights the risk organisations face as they battle to maintain legacy application code, while attempting to respond to market volatility. Forrester says technical debt both increases IT costs and risks while slowing down the delivery of new capabilities. It urges IT leaders to outsource support for technical debt to a provider, which then enables the IT team to drive forward modern IT architecture and delivery practices. “Outsourcing the legacy tech stack to proven outsource providers will ensure operational reliability at a negotiated cost, and free up funds and teams to build a modern, adaptive and AI [artificial intelligence]-powered ecosystem that drives innovation and positions you for future growth,” analysts Sharyn Leaver, Eric Brown, Riley McDonnell and Rachel Birrell state in Forrester’s Budget planning: Prepare for even more volatility report. Application modernisation approaches are not scalable But whether it is the responsibility of an in-house team or an outsourcer, according to Schneider, the traditional way to manage technical debt is not working. Historically, he points out that code was left with product engineers to continue to revise the application going forward and keep it up to date. Sometimes, he says, an IT consulting firm would be brought in to establish a software factory, providing application maintenance, working on one application at a time. According to Schneider, this approach has not worked. The approach Moderne takes is to consider tasks that can be solved horizontally, across the whole business. Schneider used to work at Netflix and is the inventor of OpenRewrite, an open-source software auto-refactoring tool, and has built a business around the complexity of keeping code current. Every piece of code created basically ends up as technical debt as soon as it is deployed into production. “I could make all the perfect decisions around an application’s architecture and pick all the best libraries today, then, two months from now, for one reason or another, it’s no longer optimal,” he says. Moderne effectively scans enterprise source code and produces a lossless semantic tree (see Swapping out a software library) of the code, stored in a database. This can then be queried to understand the impact of code changes. It can also be used with recipes that enable software developers to replace software libraries in an automated fashion. Software developers can see if the recipe produces the desired results from a coding standpoint; they can tweak it if necessary, before running it to make the required change across the entire code base. Using AI with coding recipes These recipes can be created using a large language model (LLM) like Claude Code. “A couple of weeks ago, a banking executive said he was trying to move applications from on-prem to containerised,” says Schneider. “But the key problem was that the applications were writing log files to disk.” This, he says, blocked the migration. “We needed to alter the logging configuration and change the code itself so that it does not write to disk,” adds Schneider. He believes writing custom recipes to do these kinds of transformations involves learning the programming framework and becoming an expert at recipe development. However, by using Claude Code, Schneider says it took just 20 minutes to create a brand new custom recipe. “Claude Code wrote the first 10 or so patterns to modify different kinds of logging configuration and how to route this stuff out,” he says. “We could then take that recipe, use it across the first 9,000 source code repositories and see the kinds of changes that were being made.” The developer can assess the patterns produced by the recipe to check if they work and then feed them back into Claude AI iteratively to produce similar patterns or improve a pattern the developer considered unsuitable.  For Schneider, the recipe, rather like a cooking recipe, is a set of instructions that can be followed step by step, to deploy a code change. The recipe can also be tweaked and improved. “Once you are comfortable with the changes, you then have a deterministic machine to stamp it out everywhere,” he says. “We get a kind of quick iteration feedback,” adds Schneider. “At the end of the day, what you don’t have is a probabilistic system, like an LLM, making all the code edits. Rather, the probabilistic system writes a recipe that becomes a deterministic machine to make the change across the whole code base.”  He says that given the volume of code in production, IT departments need an approach that scales. “It’s hard to imagine just how much code is out there,” says Schneider. At one of Moderne’s larger customers, he says almost five billion lines of source code is being managed.  For Schneider, AI-based refactoring where the source code is loaded into an LLM does not stack up. The cost alone can amount to millions of dollars, which makes the approach he and Moderne takes in using Claude AI just to create recipes a potential big cost-saver. Moderne is on

Interview: Is there an easier way to refactor applications? Read More »

Reverse engineering GitHub Actions cache to make it fast

TL;DR We reverse engineered GitHub Actions cache internals to transparently route cache requests through our faster, colocated cache. This delivered up to 10x faster cache performance for some of our customers, with no code changes required and no need to maintain forks of upstream actions. Before this work began, we already had a faster alternative to Github Actions cache. Our approach was different: we forked each of the popular first-party actions that depended on Actions cache to point to our faster, colocated cache. But my coworkers weren’t satisfied with that solution, since it required users to change a single line of code. Apart from the user experience, maintaining these forks steadily turned into a nightmare for us. We kept at it for a while, but eventually reached an inflection point, and the operational cost became too high. So, I set out to reverse engineer GitHub Actions cache itself, with one goal: make it fast. Really fast. And this time, without having to maintain forks or requiring the user to change a single line of code. Not one. Sniffing out GitHub cache requests The first step was fully understanding the inner workings of the GitHub Actions cache. Our prior experience forking the existing cache actions proved helpful, but earlier this year, GitHub threw us a curveball by deprecating its legacy cache actions in favor of a new Twirp-based service using the Azure Blob Storage SDK. Although a complete redesign, it was a win in our eyes — we love Protobufs. They’re easy to reason about, and once we could reverse engineer the interface, we could spin up a fully compatible, blazing-fast alternative.  Enter our new friend: Claude. (It’s 2025, after all.) After a few iterations of creative prompt engineering, we sniffed out the requests GitHub made to its control plane and came up with a proto definition of the actions service. If you’re hacking on similar black boxes, I highly recommend trusting an LLM with this. But what about the Azure Blob Storage? The GitHub system switched to Azure, but our cache backend runs atop a self-hosted MinIO cluster, which is an S3-compatible blob storage. In an ideal world, all blob stores would be interchangeable, but we do not live in an ideal world (at least, not yet). We had to figure out the shape of those requests. It took a little more effort to figure it out, but in the end, all roads led to network proxies. ‍Proxy here, proxy there, proxy everywhere‍ Achieving a truly seamless experience with zero required code changes requires some magic: every VM request still appears to go to the original destination (i.e. GitHub’s control plane and Azure Blob Storage), but under the hood, we sneakily redirect them within the network stack.  Now, a little color on the context: Blacksmith is a high-performance, multi-tenant CI cloud for GitHub Actions. Our fleet runs on bare-metal servers equipped with high single-core performance gaming CPUs and NVMe drives. At the time of writing this, we manage 500+ hosts across several data centers globally, spinning up ephemeral Firecracker VMs for each customer’s CI job. Every job runs Ubuntu with GitHub-provided root filesystems. With the context set, let’s talk implementation.  VM proxy Inside each VM, we configured a lightweight NGINX server to proxy requests back to our host-level proxy. Why this extra layer? It’s simple: we need to maintain state for every upload and download, and access control is non-negotiable. By handling proxying inside the VM, we pick up a nice bonus: jobs running inside Docker containers can have their egress traffic cleanly intercepted and routed through our NGINX proxy. No special hacks required. These proxy servers are smart about what they forward. Cache-related requests are all redirected to our host proxy, while other GitHub control plane requests — such as those we don’t handle, like GitHub artifact store — go straight to their usual destinations. The choice of NGINX came down to practicality. All our root file systems ship with NGINX preinstalled, and the proxying we do here is dead simple. Sometimes the best tool is the one that’s already in the box, and in this case, there was no need to look any further. Fighting the Azure SDK  While NGINX takes care of request routing for the GitHub Actions control plane, getting things to play nicely with the Azure SDK called for some serious kernel-level network gymnastics.  We were several cycles deep into our implementation when a surprising reality emerged: our new caching service was lagging behind our legacy version, particularly when it came to downloads. Curious, we drove back into the source code of GitHub toolkit. What we found was telling: if the hostname isn’t recognized as an Azure Blob Storage (e.g., blob.core.windows.net), the toolkit quietly skips many of its concurrency optimizations. Suddenly, the bottleneck made sense. To address this, we performed some careful surgery. We built our own Azure-like URLs, then a decoder and translator in our host proxy to convert them into S3-compatible endpoints. Only then did the pieces fall into place, and performance once again became a nonissue. We started with VM-level DNS remapping to map the Azure-like URL to our VM agent host. But redirecting just these specific requests to our host-level proxy required an additional step to get there. Our initial implementation at this proxying layer leaned on iptables rules to steer the right traffic toward our host proxy. It worked, at least until it didn’t. Through testing, we quickly hit the limits: iptables was already doing heavy lifting for other subsystems inside our environment, and with each VM adding or removing its own set of rules, things got messy fast, and extremely flakey. That led us to nftables, the new standard for packet filtering on Linux, and a perfect fit for our use case:  Custom rule tables: Namespacing rules per VM became simple, making it straightforward to add or remove these rules.  Atomic configuration changes: Unlike iptables, nftables allows us to atomically swap out entire config blocks. This avoids conflicts

Reverse engineering GitHub Actions cache to make it fast Read More »

Scroll to Top