ContentSproute

us-technology

Crowdstrike’s massive cyber outage 1-year later: lessons enterprises can learn to improve security

As we wrote in our initial analysis of the CrowdStrike incident, the July 19, 2024, outage served as a stark reminder of the importance of cyber resilience. Now, one year later, both CrowdStrike and the industry have undergone significant transformation, with the catalyst being driven by 78 minutes that changed everything. “The first anniversary of July 19 marks a moment that deeply impacted our customers and partners and became one of the most defining chapters in CrowdStrike’s history,” CrowdStrike’s President Mike Sentonas wrote in a blog detailing the company’s year-long journey toward enhanced resilience. The numbers remain sobering: A faulty Channel File 291 update, deployed at 04:09 UTC and reverted just 78 minutes later, crashed 8.5 million Windows systems worldwide. Insurance estimates put losses at $5.4 billion for the top 500 U.S. companies alone, with aviation particularly hard hit with 5,078 flights canceled globally. Steffen Schreier, senior vice president of product and portfolio at Telesign, a Proximus Global company, captures why this incident resonates a year later: “One year later, the CrowdStrike incident isn’t just remembered, it’s impossible to forget. A routine software update, deployed with no malicious intent and rolled back in just 78 minutes, still managed to take down critical infrastructure worldwide. No breach. No attack. Just one internal failure with global consequences.” The AI Impact Series Returns to San Francisco – August 5 The next phase of AI is here – are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows – from real-time decision-making to end-to-end automation. Secure your spot now – space is limited: https://bit.ly/3GuuPLF His technical analysis reveals uncomfortable truths about modern infrastructure: “That’s the real wake-up call: even companies with strong practices, a staged rollout, fast rollback, can’t outpace the risks introduced by the very infrastructure that enables rapid, cloud-native delivery. The same velocity that empowers us to ship faster also accelerates the blast radius when something goes wrong.” Understanding what went wrong CrowdStrike’s root cause analysis revealed a cascade of technical failures: a mismatch between input fields in their IPC Template Type, missing runtime array bounds checks and a logic error in their Content Validator. These weren’t edge cases but fundamental quality control gaps. Merritt Baer, incoming Chief Security Officer at Enkrypt AI and advisor to companies including Andesite, provides crucial context: “CrowdStrike’s outage was humbling; it reminded us that even really big, mature shops get processes wrong sometimes. This particular outcome was a coincidence on some level, but it should have never been possible. It demonstrated that they failed to instate some basic CI/CD protocols.” Her assessment is direct but fair: “Had CrowdStrike rolled out the update in sandboxes and only sent it in production in increments as is best practice, it would have been less catastrophic, if at all.” Yet Baer also recognizes CrowdStrike’s response: “CrowdStrike’s comms strategy demonstrated good executive ownership. Execs should always take ownership—it’s not the intern’s fault. If your junior operator can get it wrong, it’s my fault. It’s our fault as a company.” Leadership’s accountability George Kurtz, CrowdStrike’s founder and CEO, exemplified this ownership principle. In a LinkedIn post reflecting on the anniversary, Kurtz wrote: “One year ago, we faced a moment that tested everything: our technology, our operations, and the trust others placed in us. As founder and CEO, I took that responsibility personally. I always have and always will.” His perspective reveals how the company channeled crisis into transformation: “What defined us wasn’t that moment; it was everything that came next. From the start, our focus was clear: build an even stronger CrowdStrike, grounded in resilience, transparency, and relentless execution. Our North Star has always been our customers.” CrowdStrike goes all-in on a new Resilient by Design framework CrowdStrike’s response centered on their Resilient by Design framework, which Sentonas describes as going beyond “quick fixes or surface-level improvements.” The framework’s three pillars, including Foundational, Adaptive and Continuous components, represent a comprehensive rethinking of how security platforms should operate. Key implementations include: Sensor Self-Recovery: Automatically detects crash loops and transitions to safe mode New Content Distribution System: Ring-based deployment with automated safeguards Enhanced Customer Control: Granular update management and content pinning capabilities Digital Operations Center: Purpose-built facility for global infrastructure monitoring Falcon Super Lab: Testing thousands of OS, kernel and hardware combinations “We didn’t just add a few content configuration options,” Sentonas emphasized in his blog. “We fundamentally rethought how customers could interact with and control enterprise security platforms.” Industry-wide supply chain awakening The incident forced a broader reckoning about vendor dependencies. Baer frames the lesson starkly: “One huge practical lesson was just that your vendors are part of your supply chain. So, as a CISO, you should test the risk to be aware of it, but simply speaking, this issue fell on the provider side of the shared responsibility model. A customer wouldn’t have controlled it.” CrowdStrike’s outage has permanently altered vendor evaluation: “I see effective CISOs and CSOs taking lessons from this, around the companies they want to work with and the security they receive as a product of doing business together. I will only ever work with companies that I respect from a security posture lens. They don’t need to be perfect, but I want to know that they are doing the right processes, over time.” Sam Curry, CISO at Zscaler, added, “What happened to CrowdStrike was unfortunate, but it could have happened to many, so perhaps we don’t put the blame on them with the benefit of hindsight. What I will say is that the world has used this to refocus and has placed more attention to resilience as a result, and that’s a win for everyone, as our collective goal is to make the internet safer and more secure for all.” Underscores the need for a new security paradigm Schreier’s analysis extends beyond CrowdStrike to fundamental security architecture: “Speed at scale comes at a cost. Every routine update now carries the weight of potential systemic failure.

Crowdstrike’s massive cyber outage 1-year later: lessons enterprises can learn to improve security Read More »

Google DeepMind makes AI history with gold medal win at world’s toughest math competition

Google DeepMind announced Monday that an advanced version of its Gemini artificial intelligence model has officially achieved gold medal-level performance at the International Mathematical Olympiad, solving five of six exceptionally difficult problems and earning recognition as the first AI system to receive official gold-level grading from competition organizers. The victory advances the field of AI reasoning and puts Google ahead in the intensifying battle between tech giants building next-generation artificial intelligence. More importantly, it demonstrates that AI can now tackle complex mathematical problems using natural language understanding rather than requiring specialized programming languages. “Official results are in — Gemini achieved gold-medal level in the International Mathematical Olympiad!” Demis Hassabis, CEO of Google DeepMind, wrote on social media platform X Monday morning. “An advanced version was able to solve 5 out of 6 problems. Incredible progress.” The International Mathematical Olympiad, held annually since 1959, is widely considered the world’s most prestigious mathematics competition for pre-university students. Each participating country sends six elite young mathematicians to compete in solving six exceptionally challenging problems spanning algebra, combinatorics, geometry, and number theory. Only about 8% of human participants typically earn gold medals. The AI Impact Series Returns to San Francisco – August 5 The next phase of AI is here – are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows – from real-time decision-making to end-to-end automation. Secure your spot now – space is limited: https://bit.ly/3GuuPLF How Google DeepMind’s Gemini Deep Think cracked math’s toughest problems Google’s latest success far exceeds its 2024 performance, when the company’s combined AlphaProof and AlphaGeometry systems earned silver medal status by solving four of six problems. That earlier system required human experts to first translate natural language problems into domain-specific programming languages and then interpret the AI’s mathematical output. This year’s breakthrough came through Gemini Deep Think, an enhanced reasoning system that employs what researchers call “parallel thinking.” Unlike traditional AI models that follow a single chain of reasoning, Deep Think simultaneously explores multiple possible solutions before arriving at a final answer. “Our model operated end-to-end in natural language, producing rigorous mathematical proofs directly from the official problem descriptions,” Hassabis explained in a follow-up post on the social media site X, emphasizing that the system completed its work within the competition’s standard 4.5-hour time limit. We achieved this year’s impressive result using an advanced version of Gemini Deep Think (an enhanced reasoning mode for complex problems). Our model operated end-to-end in natural language, producing rigorous mathematical proofs directly from the official problem descriptions –… — Demis Hassabis (@demishassabis) July 21, 2025 The model achieved 35 out of a possible 42 points, comfortably exceeding the gold medal threshold. According to IMO President Prof. Dr. Gregor Dolinar, the solutions were “astonishing in many respects” and found to be “clear, precise and most of them easy to follow” by competition graders. OpenAI faces backlash for bypassing official competition rules The announcement comes amid growing tension in the AI industry over competitive practices and transparency. Google DeepMind’s measured approach to releasing its results has drawn praise from the AI community, particularly in contrast to rival OpenAI’s handling of similar achievements. “We didn’t announce on Friday because we respected the IMO Board’s original request that all AI labs share their results only after the official results had been verified by independent experts & the students had rightly received the acclamation they deserved,” Hassabis wrote, appearing to reference OpenAI’s earlier announcement of its own olympiad performance. Btw as an aside, we didn’t announce on Friday because we respected the IMO Board’s original request that all AI labs share their results only after the official results had been verified by independent experts & the students had rightly received the acclamation they deserved — Demis Hassabis (@demishassabis) July 21, 2025 Social media users were quick to note the distinction. “You see? OpenAI ignored the IMO request. Shame. No class. Straight up disrespect,” wrote one user. “Google DeepMind acted with integrity, aligned with humanity.” The criticism stems from OpenAI’s decision to announce its own mathematical olympiad results without participating in the official IMO evaluation process. Instead, OpenAI had a panel of former IMO participants grade its AI’s performance, a approach that some in the community view as lacking credibility. “OpenAI is quite possibly the worst company on the planet right now,” wrote one critic, while others suggested the company needs to “take things seriously” and “be more credible.” You see? OpenAI ignored the IMO request. Shame. No class. Straight up disrespect. Google DeepMind acted with integrity, aligned with humanity. TRVTHNUKE pic.twitter.com/8LAOak6XUE — NIK (@ns123abc) July 21, 2025 Inside the training methods that powered Gemini’s mathematical mastery Google DeepMind’s success appears to stem from novel training techniques that go beyond traditional approaches. The team used advanced reinforcement learning methods designed to leverage multi-step reasoning, problem-solving, and theorem-proving data. The model was also provided access to a curated collection of high-quality mathematical solutions and received specific guidance on approaching IMO-style problems. The technical achievement impressed AI researchers who noted its broader implications. “Not just solving math… but understanding language-described problems and applying abstract logic to novel cases,” wrote AI observer Elyss Wren. “This isn’t rote memory — this is emergent cognition in motion.” Ethan Mollick, a professor at the Wharton School who studies AI, emphasized the significance of using a general-purpose model rather than specialized tools. “Increasing evidence of the ability of LLMs to generalize to novel problem solving,” he wrote, highlighting how this differs from previous approaches that required specialized mathematical software. It wasn’t just OpenAI. Google also used a general purpose model to solve the very hard math problems of the International Math Olympiad in plain language. Last year they used specialized tool use Increasing evidence of the ability of LLMs to generalize to novel problem solving https://t.co/Ve72fFmx2b — Ethan Mollick (@emollick) July 21, 2025 The model demonstrated particularly impressive reasoning in one problem where many human competitors applied graduate-level mathematical concepts. According

Google DeepMind makes AI history with gold medal win at world’s toughest math competition Read More »

Chinese startup Manus challenges ChatGPT in data visualization: which should enterprises use?

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now The promise sounds almost too good to be true: drop a messy comma separated values (CSV) file into an AI agent, wait two minutes, and get back a polished, interactive chart ready for your next board presentation.  But that’s exactly what Chinese startup Manus.im is delivering with its latest data visualization feature, launched this month. Unfortunately, my initial hands-on testing with corrupted datasets reveals a fundamental enterprise problem: impressive capabilities paired with insufficient transparency about data transformations. While Manus handles messy data better than ChatGPT, neither tool is yet ready for boardroom-ready slides. Rossums’ survey of 470 finance leaders found 58% still rely primarily on Excel for monthly KPIs, despite owning BI licenses. Another TechRadar study estimates that overall spreadsheet dependence affects roughly 90% of organizations — creating a “last-mile data problem” between governed warehouses and hasty CSV exports that land in analysts’ inboxes hours before critical meetings. The AI Impact Series Returns to San Francisco – August 5 The next phase of AI is here – are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows – from real-time decision-making to end-to-end automation. Secure your spot now – space is limited: https://bit.ly/3GuuPLF Manus targets this exact gap. Upload your CSV, describe what you want in natural language, and the agent automatically cleans the data, selects the appropriate Vega-Lite grammar and returns a PNG chart ready for export—no pivot tables required. Where Manus beats ChatGPT: 4x slower but more accurate with messy data I tested both Manus and ChatGPT’s Advanced Data Analysis using three datasets (113k-row ecommerce orders, 200k-row marketing funnel 10k-row SaaS MRR), first clean, then corrupted with 5% error injection including nulls, mixed-format dates and duplicates.  For example, testing the same prompt — “Show me a month-by-month revenue trend for the past year and highlight any unusual spikes or dips” — across clean and corrupted 113k-row e-commerce data revealed some stark differences. Tool Data Quality Time Cleans Nulls Parses Dates Handles Duplicates Comments Manus Clean 1:46 N/A ✓ N/A Correct trend, standard presentation, but incorrect numbers Manus Messy 3:53 ✓ ✓ ✗ Correct trend despite inaccurate data  ChatGPT Clean 0:57 N/A ✓ N/A Fast, but incorrect visualisation ChatGPT Messy 0:59 ✗ ✗ ✗ Incorrect trend from unclean data For context: DeepSeek could only handle 1% of the file size, while Claude and Grok took over 5 minutes each but produced interactive charts without PNG export options. Outputs: Figure 1-2: Chart outputs from the same revenue trend prompt on messy e-commerce data. Manus (bottom) produces a coherent trend despite data corruption, while ChatGPT (top) shows distorted patterns from unclean date formatting. Manus behaves like a cautious junior analyst — automatically tidying data before charting, successfully parsing date inconsistencies and handling nulls without explicit instructions. When I requested the same revenue trend analysis on corrupted data, Manus took nearly 4 minutes but produced a coherent visualization despite the data quality issues. ChatGPT operates like a speed coder — prioritizing fast output over data hygiene. The same request took just 59 seconds but produced misleading visualizations because it didn’t automatically clean formatting inconsistencies. However, both tools failed in terms of “executive readiness.” Neither produced board-ready axis scaling or readable labels without follow-up prompts. Data labels were frequently overlapping or too small, bar charts lacked proper gridlines and number formatting was inconsistent. The transparency crisis enterprises can’t ignore Here’s where Manus becomes problematic for enterprise adoption: the agent never surfaces cleaning steps it applies. An auditor reviewing the final chart has no way to confirm whether outliers were dropped, imputed or transformed. When a CFO presents quarterly results based on a Manus-generated chart, what happens when someone asks, “How did you handle the duplicate transactions from the Q2 system integration?” The answer is silence. ChatGPT, Claude and Grok all show their Python code, though transparency through code review isn’t scalable for business users lacking programming experience. What enterprises need is a simpler audit trail, which builds trust. Warehouse-native AI is racing ahead While Manus focuses on CSV uploads, major platforms are building chart generation directly into enterprise data infrastructure: Google’s Gemini in BigQuery became generally available in August 2024, enabling the generation of SQL queries and inline visualizations on live tables while respecting row-level security. Microsoft’s Copilot in Fabric reached GA in the Power BI experience in May 2024, creating visuals inside Fabric notebooks while working directly with Lakehouse datasets.  GoodData’s AI Assistant, launched in June 2025, operates within customer environments and respects existing semantic models, allowing users to ask questions in plain language while receiving answers that align with predefined metrics and business terms. These warehouse-native solutions eliminate CSV exports entirely, preserve complete data lineage and leverage existing security models — advantages file-upload tools like Manus struggle to match. Critical gaps for enterprise adoption My testing revealed several blockers: Live data connectivity remains absent — Manus supports file uploads only, with no Snowflake, BigQuery or S3 connectors. Manus.im says connectors are “on the roadmap” but offers no timeline. Audit trail transparency is completely missing. Enterprise data teams need transformation logs showing exactly how AI cleaned their data and whether its interpretation of the fields are correct. Export flexibility is limited to PNG outputs. While adequate for quick slide decks, enterprises need customizable, interactive export options. The verdict: impressive tech, premature for enterprise use cases  For SMB executives drowning in ad-hoc CSV analysis, Manus’s drag-and-drop visualisation seems to be doing the job.  The autonomous data cleaning handles real-world messiness that would otherwise require manual preprocessing, cutting turnaround from hours to minutes when you have reasonably complete data.  Additionally, it offers a significant runtime advantage over Excel or Google Sheets, which require manual pivots and incur substantial load times due to local compute power limitations. But regulated enterprises with governed data lakes should wait

Chinese startup Manus challenges ChatGPT in data visualization: which should enterprises use? Read More »

A ChatGPT ‘router’ that automatically selects the right OpenAI model for your job appears imminent

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now In the 2.5 years since OpenAI debuted ChatGPT, the number of large language models (LLMs) that the company has made available as options to power its hit chatbot has steadily grown. In fact, there are now a total of 7 (!!!) different AI models that paying ChatGPT subscribers (of the $20 Plus tier and more expensive tiers) can choose between when interacting with the trusty chatbot — each with its own strengths and weaknesses. But how should a user decide which one to use for their particular prompt, question, or task? After all, you can only pick one at a time. The AI Impact Series Returns to San Francisco – August 5 The next phase of AI is here – are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows – from real-time decision-making to end-to-end automation. Secure your spot now – space is limited: https://bit.ly/3GuuPLF Is help on the way? Help appears to be on the way imminently from OpenAI — as reports emerged over the last few days on X from AI influencers, including OpenAI’s own researcher “Roon (@tszzl on X)” (speculated to be technical team member Tarun Gogineni) — of a new “router” function that will automatically select the best OpenAI model to respond to the user’s input on the fly, depending on the specific input’s content. As Roon posted on the social network X yesterday, July 20, 2025, in since-deleted response to influencer Lisan al Gaib’s statement that they “don’t want a model router I want to be able to select the models I use”: “You’ll still be able to select. This is a product to make sure that doctors aren’t stuck on 4o-mini” Similarly, Yuchen Jin, Co-founder & CTO of AI inference cloud provider Hyperbolic Labs, wrote in an X post on July 19. “Heard GPT-5 is imminent, from a little bird. It’s not one model, but multiple models. It has a router that switches between reasoning, non-reasoning, and tool-using models. That’s why Sam said they’d “fix model naming”: prompts will just auto-route to the right model. GPT-6 is in training. I just hope they’re not delaying it for more safety tests. 🙂“ While a presumably far more advanced GPT-5 model would (and will) be huge news if and when released, the router may make life much easier and more intelligent for the average ChatGPT subscriber. It would also follow on the heels of other third-party products such as the web-based Token Monster chatbot, which automatically select and combine responses from multiple third-party LLMs to respond to user queries. Asked about the router idea and comments from “Roon,” an OpenAI spokesperson declined to provide a response or further information at this time. Solving the overabundance of choice problem To be clear, every time OpenAI has released a new LLM to the public, it has diligently shared in either a blog post or release notes or both what it thinks that particular model is good for and designed to help with. For example, OpenAI’s “o” series reasoning models — o3, o4-mini, o4-mini high — have performed better on math, science, and coding tests thanks to benchmarking tests, while non-reasoning models like the new GPT-4.5 and 4.1 seem to do better at creative writing and communications tasks. Dedicated AI influencers and power users may understand very well what all these different models are good and not so good at. But regular users who don’t follow the industry as closely, nor have the time and finances available to test them all out on the same input prompts and compare the outputs, will understandably struggle to make sense of the bewildering array of options. That could mean they’re missing out on smarter, more intelligent, or more capable responses from ChatGPT for their task at hand. And in the case of fields like medicine, as Roon alluded to, the difference could be one of life or death. It’s also interesting to speculate on how an automatic LLM router might change public perceptions toward and adoption of AI more broadly. ChatGPT already counted 500 million active users as of March. If more of these people were automatically guided toward more intelligent and capable LLMs to handle their AI queries, the impact of AI on their workloads and that of the entire global economy would seem likely to be felt far more acutely, creating a positive “snowball” effect. That is, as more people saw more gains from ChatGPT automatically choosing the right AI model for their queries, and as more enterprises reaped greater efficiency from this process, more and more individuals and organizations would likely be convinced by the utility of AI and be more willing to pay for it, and as they did so, even more AI-powered workflows would spread out in the world. But right now, this is presumably all being held back a little by the fact that the ChatGPT model picker requires the user to A. know they even have a choice of models and B. have some level of informed awareness of what these models are good for. It’s all still a manually driven process. Like going to the supermarket in your town and staring at aisles of cereal and different sauces, the average ChatGPT user is currently faced with an overabundance of choice. Hopefully any hypothetical OpenAI router seamlessly helps direct them to the right model product for their needs, when they need it — like a trusty shopkeeper showing up to free you from your product paralysis. Daily insights on business use cases with VB Daily If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

A ChatGPT ‘router’ that automatically selects the right OpenAI model for your job appears imminent Read More »

How to Migrate from OpenAI to Cerebrium for Cost-Predictable AI Inference

If you’re building an AI application, you probably started with OpenAI’s convenient APIs. However, as your application scales, you’ll need more control over costs, models, and infrastructure. Cerebrium is a serverless AI infrastructure platform that lets you run open-source models on dedicated hardware with predictable, time-based pricing instead of token-based billing. This guide will show you how to build a complete chat application with OpenAI, migrate it to Cerebrium by changing just two lines of code, and add performance and cost tracking to compare the two approaches to AI inference using real data. When you’re done, you’ll have a working chat application that demonstrates the practical differences between token-based and compute-based pricing models, and the insights you need to choose the right approach for your use case. Prerequisites To follow along with this guide, you’ll need Python 3.10 or higher installed on your system. You’ll also need the following (all free): OpenAI API key. Cerebrium account (includes free tier access to test GPU instances up to A10 level). Hugging Face token (free account required). Llama 3.1 model access on Hugging Face. Visit meta-llama/Meta-Llama-3.1-8B-Instruct and click “Request access” to get approval from Meta (typically takes a few minutes to a few hours). Familiarity with Python and API calls is helpful, but we’ll walk through each step in detail. Creating an OpenAI Chatbot We’ll build a complete chat application that works with OpenAI as our foundation and enhance it throughout the tutorial without ever needing to modify the core chat logic. Create a new directory for the project and set up the basic structure: mkdir openai-cerebrium-migration cd openai-cerebrium-migration Install the dependencies: pip install openai==1.55.0 python-dotenv==1.0.0 art==6.1 colorama==0.4.6 Create a .env file to store API credentials: OPENAI_API_KEY=your_openai_api_key_here CEREBRIUM_API_KEY=your_cerebrium_api_key_here CEREBRIUM_ENDPOINT_URL=your_cerebrium_endpoint_url_here Replace your_openai_api_key_here with your actual OpenAI API key. Now we’ll build the chat.py file step by step. Start by creating the file and adding the imports: import os import time from dotenv import load_dotenv from openai import OpenAI from art import text2art from colorama import init, Fore, Style These imports handle environment variables, OpenAI client creation, ASCII art generation, and colored terminal output. Add the initialization below the imports: load_dotenv() init(autoreset=True) Add this display_intro function: def display_intro(use_cerebrium, endpoint_name): print(“n”) if use_cerebrium: ascii_art = text2art(“Cerebrium”, font=”tarty1″) print(f”{Fore.MAGENTA}{ascii_art}{Style.RESET_ALL}”) else: ascii_art = text2art(“OpenAI”, font=”tarty1″) print(f”{Fore.WHITE}{ascii_art}{Style.RESET_ALL}”) print(f”Connected to: {Fore.CYAN}{endpoint_name}{Style.RESET_ALL}”) print(“nType ‘quit’ or ‘exit’ to end the chatn”) This function provides visual feedback when we switch between endpoints. Add the main function that handles the chat logic: def main(): # OpenAI endpoint client = OpenAI(api_key=os.getenv(“OPENAI_API_KEY”)) model = “gpt-4o-mini” endpoint_name = “OpenAI (GPT-4o-mini)” use_cerebrium = False display_intro(use_cerebrium, endpoint_name) conversation = [] while True: user_input = input(“You: “).strip() if user_input.lower() in [‘quit’, ‘exit’, ‘bye’]: print(“Goodbye!”) break if not user_input: continue conversation.append({“role”: “user”, “content”: user_input}) This function sets up the endpoint configuration and handles the basic chat loop. Add the response handling logic inside the main function’s while loop: try: print(“Bot: “, end=””, flush=True) chat_completion = client.chat.completions.create( messages=conversation, model=model, stream=True, stream_options={“include_usage”: True}, temperature=0.7 ) bot_response = “” for chunk in chat_completion: if chunk.choices[0].delta.content: content = chunk.choices[0].delta.content print(content, end=””, flush=True) bot_response += content print() conversation.append({“role”: “assistant”, “content”: bot_response}) except Exception as e: print(f”❌ Error: {e}”) conversation.pop() Finally, add the script execution guard at the end of the file: if __name__ == “__main__”: main() Test the chatbot by running: You’ll see the OpenAI ASCII art, and you can start chatting with GPT-4o mini. Ask a question to verify that the app works correctly. Responses will stream in real-time. Deploying a Cerebrium Endpoint With vLLM and Llama 3.1 Now we’ll create a Cerebrium endpoint that serves the same OpenAI-compatible interface using vLLM and an open-source model. When we’re done, we’ll be able to switch to a self-hosted open-source model endpoint by changing just two lines of code. Configuring Hugging Face Access for Llama 3.1 First, make sure you have access to the Llama 3.1 model on Hugging Face. If you haven’t already requested access, visit meta-llama/Meta-Llama-3.1-8B-Instruct and click “Request access”. Next, create a Hugging Face token by going to Hugging Face settings, clicking “New token”, and selecting “Read” permissions. Add your Hugging Face token to your Cerebrium project secrets. Go to your Cerebrium dashboard, select your project, and add HF_AUTH_TOKEN with your Hugging Face token as the value. Setting Up a Cerebrium Account and API Access Create a free Cerebrium account and navigate to your dashboard. In the “API Keys” section, copy your session token and save it for later – you’ll need it to authenticate with the deployed endpoint. Add the session token to the .env file as a CEREBRIUM_API_KEY variable: OPENAI_API_KEY=your_openai_api_key_here CEREBRIUM_API_KEY=your_cerebrium_api_key_here CEREBRIUM_ENDPOINT_URL=your_cerebrium_endpoint_url_here Building the OpenAI-Compatible vLLM Endpoint Start by installing the Cerebrium CLI and creating a new project: pip install cerebrium cerebrium login cerebrium init openai-compatible-endpoint cd openai-compatible-endpoint We’ll build the main.py file step by step to understand each component. Start with the imports and authentication: from vllm import SamplingParams, AsyncLLMEngine from vllm.engine.arg_utils import AsyncEngineArgs from pydantic import BaseModel from typing import Any, List, Optional, Union, Dict import time import json import os from huggingface_hub import login login(token=os.environ.get(“HF_AUTH_TOKEN”)) These imports provide the vLLM async engine for model inference, Pydantic models for data validation, and Hugging Face authentication for model access. Add the vLLM engine configuration: engine_args = AsyncEngineArgs( model=”meta-llama/Meta-Llama-3.1-8B-Instruct”, gpu_memory_utilization=0.9, # Set GPU memory utilization max_model_len=8192 # Set max model length ) engine = AsyncLLMEngine.from_engine_args(engine_args) This configuration uses 90% of available GPU memory and sets an 8K-token context window, optimizing for throughput while maintaining reasonable memory usage. Now add the Pydantic models that define the OpenAI-compatible response format: class Message(BaseModel): role: str content: str class ChoiceDelta(BaseModel): content: Optional[str] = None function_call: Optional[Any] = None refusal: Optional[Any] = None role: Optional[str] = None tool_calls: Optional[Any] = None class Choice(BaseModel): delta: ChoiceDelta finish_reason: Optional[str] = None index: int logprobs: Optional[Any] = None class Usage(BaseModel): completion_tokens: int = 0 prompt_tokens: int = 0 total_tokens: int = 0 class ChatCompletionResponse(BaseModel): id: str object: str created: int model: str choices: List[Choice] service_tier: Optional[str] = “default” system_fingerprint: Optional[str] = “fp_cerebrium_vllm” usage: Optional[Usage] = None These models ensure

How to Migrate from OpenAI to Cerebrium for Cost-Predictable AI Inference Read More »

Kapa.ai (YC S23) is hiring a software engineers (EU remote)

Create enterprise-grade AI assistants from your content Software Engineer (Full-stack) $100K – $150K / 0.10% – 0.30% Location GB / EG / RU / UA / TR / FR / IT / ES / PL / RO / KZ / NL / BE / SE / CZ / GR / PT / HU / AT / CH / BG / DK / FI / NO / SK / LT / EE / DE / Remote (GB; EG; RU; UA; TR; FR; IT; ES; PL; RO; KZ; NL; BE; SE; CZ; GR; PT; HU; AT; CH; BG; DK; FI; NO; SK; LT; EE; DE) Visa US citizenship/visa not required Connect directly with founders of the best YC-funded startups. Apply to role › About the role As a software engineer you will work across the stack on the Kapa systems that answer thousands of developer questions per day. Check out Docker’s documentation for a live example of what kapa is. In this role, you will: Work directly with the founding team and our research engineers. Scale the infrastructure that powers the Kapa RAG engine (Python). Experiment with new features in the Kapa analytics platform (React + Python). Work on the client integrations which are used to deploy Kapa for our customers (React + Python). Give Kapa access to new kinds of data (Python). Maintain our React SDK. You may be a good fit if you have: A degree in computer science, machine learning, mathematics, statistics or a related field. 3+ years of software engineering experience working on complex systems in both backend and frontend. An affinity for machine learning, deep learning (including LLMs) and natural language processing. The ability to work effectively in a fast in a environment where things are sometimes loosely defined. * This is neither an exhaustive nor necessary set of attributes. Even if none of these apply to you, but you believe you will contribute to kapa.ai, please reach out. About kapa.ai kapa.ai makes it easy for technical companies to build AI support and onboarding bots for their users. Teams at +150 leading startups and enterprises incl. OpenAI, Mixpanel, Mapbox, Docker, Next.js and Prisma use kapa to level up their developer experience and reduce support. We enable companies to use their existing technical knowledge sources incl. docs, tutorials, chat logs, and GitHub issues to generate AI bots that answers developer questions automatically. More than 750k developers have access to kapa.ai via website widgets, Slack/Discord bots, API integrations, or via Zendesk. We’ve been fortunate to be funded by some of the greatest AI investors in Silicon Valley: Initialized Capital (Garry Tan, Alexis Ohanian), Y Combinator, Amjad Masad and Michele Catasta (Replit), and Douwe Kiela (RAG paper author and founder of Contextual AI), and other folks incl. angels at OpenAI. Founded:2023 Batch:S23 Team Size:14 Status:Active Founders Read More

Kapa.ai (YC S23) is hiring a software engineers (EU remote) Read More »

Complete silence is always hallucinated as “ترجمة نانسي قنقر” in Arabic

Comment options {{title}} VAD, probably. I’ve only tried the turbo one, but what I can say is that v3 is different from the earlier models. It looks like it doesn’t have the audio descriptions to fall back on and produces hallucinations instead. The earlier models will also produce some miscellaneous crap when they encounter silence (they do this regardless of language), but there are more options for how to deal with that. For example, these things can be effective for the small model (but not for v3): the suppress_tokens trick setting initial prompt to something like “.” adjusting logprob_threshold to -0.4 (works for this empty audio, probably not good for general use) You must be logged in to vote 0 replies Comment options {{title}} is there any good arabic model you guys found which is better than large v3 ? @misutoneko @puthre You must be logged in to vote 1 reply Comment options {{title}} Voxtral was released a few days ago and looks promising Comment options {{title}} I found a similar thing happens in German where it says “Untertitelung des ZDF für funk, 2017.” For both German and Arabic I found that this pretty much only happens at the very end of videos / when there is sustained silence. You must be logged in to vote 0 replies Comment options {{title}} Essentially this seems to be an artifact of the fact that Whisper was trained on (amongst other things) YouTube audio + available subtitles. Often subtitlers add their copyright notice onto the end of the subtitles, and the end of the videos are often credits with music, applause, or silence. Thus whisper learned that silence == “copyright notice”. See some research for the Norwegian example here: https://medium.com/@lehandreassen/who-is-nicolai-winther-985409568201 You must be logged in to vote 0 replies Comment options {{title}} In English there is always applause You must be logged in to vote 0 replies Comment options {{title}} this also happens when you don’t speak into the voice mode, the transcript usually results in the same Arabic phrase You must be logged in to vote 0 replies Comment options {{title}} I’ve also seen this happen a lot in English with Skyeye: It also happens a lot with hallucinations saying stuff like “This is the end of the video, remember to like and subscribe” You must be logged in to vote 0 replies Comment options {{title}} I have built https://arabicworksheet.com for arabic learning from absolute beginners to professional speakers. It created dynamic exercises and worksheets based on your level and topics. Behind the scene I have used Gemini 2.5-pro & GPT-4o for overall agentic workflows. You must be logged in to vote 1 reply Comment options {{title}} Ok? This doesn’t have anything to do with the topic of this discussion Comment options {{title}} In german it’s “Vielen Dank” (Thank you very much) You must be logged in to vote 0 replies Comment options {{title}} You must be logged in to vote 0 replies Read More

Complete silence is always hallucinated as “ترجمة نانسي قنقر” in Arabic Read More »

We have made the decision to not continue paying for BBB accreditation

July 16, 2025 We have made the conscious choice not to continue paying for accreditation from the Better Business Bureau (BBB). We realize that this may raise questions among our customers, and we want to explain why we made this decision. For years, people have been told to look for BBB accredited businesses, and that it somehow reflects whether a business is on the up and up.  What most don’t realize is that businesses PAY to be accredited with the BBB. You do not EARN an accreditation- you buy it. A few months ago, an extremely negative complaint and review suddenly appeared under our name registered with the BBB.  It was a person who was upset that a Sting concert was cancelled due to fire. Their complaint was with a Music company that happened to have Cherry Tree in their name, but our business was tagged and reflected poor business practices.  We contacted the BBB many times to ask to please remove this complaint, that was obviously NOT for CherryTree Computers, from our business page.  No one at the BBB was willing or able to assist with our request…because they really don’t have the control or ability to do anything in the event of incorrect information.  This led us to then wonder…. Well, what exactly DOES the BBB do? Why would we continue to pay for accreditation if it only means we get to have the BBB logo on our website, but they don’t actually have the ability to prove or disprove how reputable a company is when it comes to business practices?  We expressed to the BBB multiple times that if the situation wasn’t rectified, we would stop paying for accreditation and let our customers know why. After a lot of waiting and no action at all from the BBB, we officially ended our relationship and will no longer pay for BBB accreditation.  We hope our services and happy customers reflect what type of business we are….and that we don’t need any special logo or stickers to prove it. Read More

We have made the decision to not continue paying for BBB accreditation Read More »

AMD’s upcoming RDNA 5 flagship could target RTX 5080-level performance with better RT

Serving tech enthusiasts for over 25 years. TechSpot means tech analysis and advice you can trust. Rumor mill: This year’s Radeon 9000 series graphics cards delivered impressive performance gains from AMD in the mid-range and mainstream market segments. However, the company chose not to compete at the very high-end categories for this generation. Although Team Red is unlikely to challenge Nvidia’s flagship products in the near future, a new GPU expected to launch next year may outperform the RTX 5080. AMD is expected to introduce a new enthusiast-class graphics card in the second half of 2026. Based on the company’s upcoming UDNA architecture, also known as RDNA 5, its configuration will closely resemble that of the Radeon RX 7900 XTX. Prominent leaker KeplerL2, who has a solid track record, speculated about the GPU’s specifications in a series of recent posts on the AnandTech forums. While the RX 9070 XT, the fastest GPU in the RDNA 4 generation, can outperform Nvidia’s GeForce RTX 5070 Ti in certain scenarios, AMD did not attempt to rival the RTX 5080, let alone the RTX 5090. However, the next lineup is expected to resemble RDNA 3 featuring a halo product that outperforms Nvidia’s 5080. The GPU won’t compete with the hypothetical RTX 6090 but could trade blows with a 6080. Similar to the 7900 XTX, the upcoming high-end AMD GPU will likely include 96 compute units and a 384-bit memory bus. A mid-range version is expected to offer 64 compute units and a 256-bit memory bus, resembling the 9070 XT. A mainstream option might be similar to the 9060 XT, with 32 compute units and a 128-bit bus. According to sources familiar with AMD’s hardware roadmap, Kepler previously estimated that UDNA will improve raster performance by approximately 20 percent over RDNA 4 and double its ray tracing capabilities. RDNA 4 already represents a significant leap in ray tracing over its predecessor. Also check out: AMD Stagnation :: Radeon 9060 XT 8GB vs 7600 vs 7600 vs 5600 XT Benchmark Our benchmarks show that the Radeon RX 9070 XT outperforms the 7900 XTX in ray tracing despite sitting an entire weight class below it in traditional rasterization. A UDNA-based GPU with the same configuration as the 7900 XTX could become a ray tracing powerhouse and may even address Radeon’s lingering disadvantage against GeForce in path tracing. Meanwhile, AMD’s UDNA architecture is also expected to power the PlayStation 6 and the next Xbox console. A recently leaked die shot suggests that Microsoft’s upcoming console includes 80 compute units, potentially outperforming the RTX 5080. With a projected price exceeding $1,000 (unlikely but that’s the rumor these days), the console appears to target the pre-built PC market instead of the traditional console market. Read More

AMD’s upcoming RDNA 5 flagship could target RTX 5080-level performance with better RT Read More »

WhatsApp is dropping its native Windows app in favor of a web-based version

Serving tech enthusiasts for over 25 years. TechSpot means tech analysis and advice you can trust. Editor’s take: Meta is preparing to deliver a worse WhatsApp experience on Windows 11 by discontinuing investment in its native desktop app. While there’s no official confirmation of this move yet, the latest WhatsApp beta makes the situation clear. The latest WhatsApp Beta introduces an unexpected change for Windows users. The update reportedly discontinues the native UWP app, replacing it with an empty shell built around the Chromium-based Edge browser framework found in recent Windows versions. WhatsApp launched a native Windows version in 2016, later converting it to use the Universal Windows Platform API with the WinUI framework. This native approach gave the app a performance edge over the web-based version. Now, Meta is returning to WebView2, the Edge framework that wraps apps around the Windows native browser component. The latest WhatsApp beta essentially behaves like the web.whatsapp.com service, which users access by pairing the mobile app with a desktop browser. By wrapping a bit of web code around the WebView2 component, WhatsApp will consume more RAM and deliver reduced performance compared to previous versions. Recent tests by Windows Latest show the new beta is consuming around 30 percent more RAM than the existing native (UWP/WebUI) stable version. Like the user-facing Edge browser, Chrome, and other Chromium-based browsers, WebView2 is a native Windows component built on the Chromium layout engine. Many simple Windows apps built around HTML, CSS, JavaScript, and other non-native web technologies rely on this component. Meta’s decision to turn back the clock with an inferior messaging experience for billions of PC users may come down to money. Windows Latest speculates that a tech giant pulling in $164.5 billion a year doesn’t want to spend a fraction of its vast wealth maintaining two separate codebases for the same app. Forcing users into a single UI benefits the company, while end users endure a worse experience on PC. Even Meta’s documentation says a native WhatsApp app offers better performance, higher reliability, and additional teamworking features – so either the developers neglected to update the docs or they simply don’t care how users feel about the UI. Another possible explanation for this potential WhatsApp fiasco is that Meta’s developers are being lazy on some desktop systems, while focusing more on the phone apps, which is exactly what they did with Facebook Messenger. The company has also drug its feet on other platforms. The company released a native iPad version just last month – a mere 15 years after Apple launched its tablet line. This patchy approach leaves PC users stuck with a downgraded experience, raising questions about Meta’s commitment to its desktop audience. Read More

WhatsApp is dropping its native Windows app in favor of a web-based version Read More »

Scroll to Top