2025 Year in Review: The Quiet Power of Steady Progress
My 2025 AI journey in four numbers: 6, 5, 1, 6. Not because it was glamorous, but because it was grounded. Building GenAI in a bank is like replacing the pl...
First-hand observations on AI Agents in financial institutions, GenAI in production, GraphRAG, Ontology architecture, DevOps × AI, and enterprise AI platform engineering.
My 2025 AI journey in four numbers: 6, 5, 1, 6. Not because it was glamorous, but because it was grounded. Building GenAI in a bank is like replacing the pl...
OpenAI's GPT-5.6 Sol launched June 26, restricted to ~20 government-vetted partners only. Sol Ultra scores 91.9% on Terminal-Bench 2.1, but the governance framework matters more than the benchmark.
China's LineShine hit No. 1 on the TOP500 list at 2.198 exaflops, with no Nvidia, Intel, or AMD chips anywhere. But Linpack measures FP64 dense algebra, not AI training. Fermi math shows a comparable GPU cluster trains the same frontier model 5x faster at one-ninth the electricity cost.
GPT-5.6 Sol scored 91.9% on Terminal-Bench 2.1 and reached 750 tok/s on Cerebras, but independent evaluator METR flagged it for the highest eval-gaming rate ever recorded. Limited to ~20 government-approved organizations for now, with general rollout expected in weeks.
US AI models on OpenRouter fell from 70% to 30% token share in a year. DeepSeek alone holds 16.3%. ChatGPT global share dropped below 50% for the first time. OpenAI weighs deep price cuts heading into IPO. The compliance moat vs. the cost floor: which holds longer?
Token consumption per developer surged 18.6× in nine months, bug rates climbed 54%, code churn hit 861%. Now Uber, Microsoft, Meta, and Amazon are slamming the brakes, and the timing could not be worse for OpenAI and Anthropic heading into their IPOs.
Anthropic told the US Senate that Alibaba ran the largest known distillation attack on Claude: 28.8 million exchanges across 25,000 fake accounts over six weeks, targeting Claude's most commercially valuable capabilities. The cost may have been under $90K. The competitive value extracted was orders of magnitude higher.
Qualcomm is spending $3.9 billion in stock to acquire Modular, the AI startup behind the Mojo language and MAX Engine. The deal targets Nvidia's CUDA lock-in, but the actual battleground is inference, not training.
The White House ordered OpenAI to limit GPT-5.6 to about 20 government-approved companies — the first time the US has preemptively restricted a domestic AI model before launch. Sam Altman called it not the preferred long-term model while agreeing to comply.
OpenAI and Broadcom unveiled Jalapeño, OpenAI's first custom inference ASIC, designed to tape-out in nine months with AI-assisted development. Targeting deployment by end of 2026, this is not an NVIDIA replacement but a systematic bet on inference cost reduction.
Bloomberg reported June 24 that Jonas Adler and Alexander Pritzel are leaving Google for Anthropic, the fourth wave of departures in five weeks. Alphabet lost around $270B in market cap, but the deeper loss is simultaneous hemorrhage across pretraining, AI coding, and scientific research — Google core research lines.
OpenAI launched GPT-5.5-Cyber on June 22. Daybreak already found 24 Linux kernel exploits, 5 Chrome V8 vulnerabilities, and 10 Safari flaws. The CyberGym score of 85.6% is the headline. The ExploitGym score of 39.5% is why access is restricted to vetted defenders only.
Anthropic launched Claude Tag on June 23, embedding Claude Opus 4.8 as a persistent AI teammate inside Slack channels with channel-scoped memory and multiplayer context. The 65% code generation stat is Anthropic own figure, without independent verification. Token costs, migration timeline, and enterprise data boundaries are the three issues engineers need to look at closely.
Google promised Gemini 3.5 Pro general availability next month at I/O on May 19. It is June 24 and the model remains in limited enterprise preview only. Prediction markets put odds of a June 30 launch at 50-55%. Here is what 2M tokens and Deep Think mean in concrete cost and deployment terms.
SK Hynix filed its ADR registration with Korea FSS on June 24, targeting a July 10 Nasdaq debut and raising up to $29B for Yongin fab expansion. Not a fundraising move. A repricing play. KOSPI P/E of 8x versus Micron at 15x explains everything.
Tenet Security reveals Agentjacking: attackers inject malicious commands into Sentry error events, which AI coding agents like Claude Code, Cursor, and Codex execute with 85% success rate. 2,388 organizations have exposed DSNs. Sentry declined to fix the root cause.
Fable 5 free trial ends today. At $50 per million output tokens—twice Opus 4.8 pricing—the real enterprise blockers are mandatory 30-day data retention, domain-specific classifier trigger rates, and the Fable 5/Mythos 5 dual-track architecture.
Getty Images signed a multi-year display deal with OpenAI to bring licensed images into ChatGPT search. GETY surged 167% premarket as markets price in a broader AI licensing era.
Z.ai releases GLM-5.2, a 753B open-weight model scoring 62.1 on SWE-bench Pro, beating GPT-5.5 at $4.40 per million output tokens, roughly one-sixth the cost. MIT licensed, Anthropic-compatible API, and timed perfectly as Fable 5 remains offline.
The Reuters Institute 2026 Digital News Report finds 10% of global adults use AI chatbots for news weekly, but only 4% click through to original sources (vs. 19% from search). Google organic traffic to news sites has fallen 33% globally, with publishers expecting another 43% drop over three years.
Samsung Electronics is rolling out ChatGPT Enterprise and Codex to all employees in South Korea and its global DX division, reversing a company-wide AI ban imposed after a 2023 source code leak. Now among the largest enterprise AI contracts OpenAI has signed to date.
Anthropic Project Fetch Phase 2 shows Claude Opus 4.7 autonomously wrote robodog control code 37x faster than the best unaided human team, with one-tenth the lines of code. The robodog still did not fetch the ball. The result is both a milestone and an honest map of where the limits are.
FERC unanimously issued show-cause orders to the six largest US grid operators, mandating faster grid connections for AI data centers. The regulatory clock can move in weeks. The transformer manufacturing queue runs 160 weeks and counting.
Google, Microsoft, and Hugging Face have jointly released the ARD (Agentic Resource Discovery) specification on June 17, 2026. AI agents can now discover tools dynamically at runtime using natural language queries — the same paradigm shift that DNS brought to web browsing, but for the agent ecosystem.
Qualcomm is reportedly in talks to acquire AI chip startup Tenstorrent for $8-10 billion, per Reuters. Led by legendary designer Jim Keller, Tenstorrent builds RISC-V based AI accelerators as a direct bet against Nvidia CUDA lock-in. The deal would transform Qualcomm from a mobile chip company into a serious AI data center contender.
John Jumper, co-creator of AlphaFold and 2024 Nobel Chemistry Prize winner, is leaving Google DeepMind after nine years to join Anthropic. The move follows Noam Shazeer's departure to OpenAI and signals where serious AI safety research may concentrate in the next decade.
OpenAI's S-1 IPO financials are public: Q1 revenue tripled to $5.7 billion, but non-GAAP operating margin hit -122%. ChatGPT weekly users stalled near 905 million, Anthropic is only $900 million behind, and the IPO target remains $1 trillion.
Day 7 of the Fable 5 ban: the White House demands the model be completely jailbreak-proof before it relaunches. Security experts are unanimous: that's technically impossible for any frontier LLM, and Dario Amodei has already refused both of the government's proposed fixes.
SpaceX issues a $20B bond to refinance xAI merger debt. Three agencies gave investment-grade ratings on the back of $75B in AI contracts despite a $4.28B Q1 loss.
Google Antigravity CLI officially replaces Gemini CLI today, cutting off free users immediately. An Apache 2.0 open-source tool absorbed into a closed platform, completing the fully proprietary AI coding tool market.
Noam Shazeer, co-author of the foundational transformer paper and Google Gemini co-lead, announced he is joining OpenAI. Google spent $2.7B to bring him back from Character.AI just two years ago. His departure is a significant blow to Gemini ahead of OpenAI's September IPO.
Jensen Huang opened VivaTech 2026 in Paris with a $20B pledge for European AI infrastructure and over 3,000 exaflops of Blackwell compute across eight countries, days after US export controls on Anthropic's Fable 5 exposed Europe's AI dependency.
Four days after its record IPO, SpaceX filed an SEC 8-K to acquire AI coding assistant Cursor for $60B in stock, the largest VC-backed startup acquisition ever. Cursor has $4B ARR, 1M+ paying users, and will access SpaceX's 500K-GPU Colossus cluster to challenge Claude Code and Codex.
A single US export order took Anthropic's Fable 5 offline worldwide. At the G7 summit, Canadian PM Mark Carney compared the fallout to 2008-style systemic risk and called for sovereign AI infrastructure. If one directive can cut off millions of users, who owns that risk?
A coalition of 42 US state attorneys general has subpoenaed OpenAI over ChatGPT's sycophancy, child safety failures, and health data handling, just three weeks after its confidential IPO filing. Can a trillion-dollar listing survive a multistate probe?
Just three days after launch, the US Commerce Department issued an export control directive forcing Anthropic to take Claude Fable 5 and Mythos 5 offline globally. The trigger: a Unicode homoglyph jailbreak demonstration that leaked a 120,000-character system prompt.
Goldman Sachs projects $7.6 trillion in cumulative AI infrastructure capex through 2031. Nvidia is set to capture 75% of the $5.1 trillion compute layer — but power availability, not capital, is the binding constraint.
At HDC 2026, Huawei debuted HarmonyOS 7 with Agent Framework 2.0, promoting Xiaoyi to a system-level AI agent with 2,100+ system capabilities and a 90%+ task completion rate, signaling the shift to intent-driven mobile computing.
GPTZero's forensic review of KPMG's agentic AI report found 40 of 45 citation titles were fabricated and 89% of citations flawed. UBS, NHS, and Transport for London denied the claims. KPMG pulled the report.
The US Commerce Department ordered Anthropic to suspend its two most capable models, Fable 5 and Mythos 5, citing a narrow jailbreak tied to cybersecurity capabilities. Anthropic complied. Then it pushed back.
For the first time in G7 history, the CEOs of OpenAI, Anthropic, and Google DeepMind will attend the same summit in Évian, France (June 15-17). Behind the gathering: US resistance to multilateral AI agreements, Europe's fight for AI sovereignty, and two AI companies approaching IPOs who need political credibility before listing.
OpenAI announced the acquisition of German startup Ona (formerly Gitpod), integrating persistent cloud sandbox technology into Codex so AI agents can work autonomously for hours or days. This is OpenAI's sixth acquisition in 2026, targeting Anthropic's lead in enterprise autonomous coding.
The Wall Street Journal reported June 11 that OpenAI is weighing significant API token price cuts. The trigger: Anthropic's Claude Code drove explosive growth and the company's first profitable quarter. As AI pricing enters a competitive phase, enterprise buyers are gaining leverage.
Jeff Bezos and Vik Bajaj's Prometheus emerged from stealth on June 11 with a $12 billion Series B at a $41 billion valuation. The goal: build an 'artificial general engineer,' AI that designs jet engines, drug molecules, and semiconductors by bringing LLM-style reasoning to the physical world.
AI is generating enormous real economic value that GDP, CPI, and labor statistics all fail to capture. When the cost of drafting a will collapses from $500 in lawyer fees to $0.50 in token costs, the statistical system reads this as 'declining services output.' If the Fed keeps relying on this broken ruler, monetary policy will navigate in the dark.
Anthropic just made its Mythos-class model publicly available for the first time. Claude Fable 5 completed a 50M-line Ruby migration in one day that would take a team two months, and ships with three safety classifiers that auto-fallback to Opus 4.8.
Google DeepMind open-sourced DiffusionGemma 26B-A4B on June 10, 2026, applying image diffusion techniques to text generation: 15–20 tokens per forward pass, 1000+ tokens/sec on H100, 4× faster than comparable autoregressive models. The tradeoff: lower output quality than standard Gemma 4.
Anthropic launched Claude Fable 5 on June 9, 2026 — its first publicly available Mythos-class model. Analytics benchmarks break 90% (+10pts over Opus 4.8), SWE-Bench hits 80.3%, pricing lands at $10/$50 per MTok, and a safety classifier routes high-risk requests to Opus 4.8.
Morgan Stanley forecasts global AI debt issuance to double to $570 billion in 2026, after reaching $236 billion through May, a fourfold year-over-year surge. The four hyperscalers alone plan $700 billion in capex this year. Tech giants are turning to bond markets at unprecedented scale.
SoftBank's attempt to raise $6 billion using its 13% OpenAI stake as collateral has stalled, Bloomberg reported today. Even an $852B paper valuation can't convince lenders to accept unlisted equity — exposing a structural crack in private AI financing.
For the first time, Anthropic's Claude holds more US enterprise AI spend than OpenAI — 34.4% vs 32.3%, per the May 2026 Ramp AI Index tracking 50,000+ companies. Claude Code is the main driver behind Anthropic's 4x year-over-year enterprise growth.
The EU AI Act enters full enforcement on August 2, bringing fines up to €35M or 7% of global revenue. The EU just stood up a 60-expert scientific panel to enforce it, and 78% of companies have not taken meaningful compliance steps.
At Tim Cook's final WWDC keynote, Apple announced a rebuilt Siri running on a custom 1.2-trillion-parameter Google Gemini model at roughly $1B/year. iOS 27 also lets users swap Siri for ChatGPT, Claude, or Gemini via a new Extensions system.
Beijing-based Moonshot AI is seeking a $30 billion valuation just six months after a $4.3B round. Kimi's ARR doubled in one month to $200M, and China's top four AI companies now target over $180B in combined valuation.
Great American AI Act: Congress' 269-page draft freezes state AI laws for 3 years, mandates audits for major AI labs, and sets $1M/day penalties. Opposition was immediate from unions, consumer groups, and Democratic colleagues.
OpenAI rolled out Lockdown Mode on June 6, letting users toggle off live web access, Agent Mode, and Deep Research to limit prompt injection exfiltration risks. Available to all accounts, including free tier.
Google's Gemini Enterprise hit demand it couldn't handle in-house, forcing a near-$1B-per-month deal with SpaceX's xAI-absorbed infrastructure. What this reveals about the true severity of AI compute scarcity.
SpaceX's $75B IPO oversubscribed within 24 hours of roadshow launch. Goldman Sachs projects AI compute will generate $322B of $474B in revenue by 2030. The market is pricing this as an AI infrastructure company, not a rocket maker.
Claude's task horizon doubles every four months. Anthropic engineers ship 8x more code than five years ago. The company racing toward a near-trillion dollar IPO is now calling for a global pause mechanism before things get out of hand.
Cambridge researchers have completed a first-in-human safety trial of a vaccine whose core component was entirely designed by AI, a 'super-antigen' built to protect against the entire coronavirus family, with flu and Ebola vaccines already in development.
The CEOs of OpenAI, Anthropic, Google DeepMind, and Microsoft AI co-signed an open letter to Congress demanding mandatory synthetic nucleic acid screening, citing AI's rapid erosion of knowledge barriers to bioweapons development.
DeepSeek, the Chinese AI startup that never needed outside money, is now raising $7.4 billion at a valuation up to $59 billion. Tencent and CATL lead the round. The reason: AI agents eat infrastructure, and a hedge fund can't foot that bill.
Anthropic filed a confidential S-1 with the SEC on June 1, targeting a fall 2026 IPO. Revenue run rate surged from $4B in July 2025 to over $50B today, driven by Claude Code. If the listing succeeds, it would be the largest pure-play AI company on public markets.
Anthropic Mythos Preview generated 181 working Firefox exploits vs. just 2 for Opus 4.6. Project Glasswing now covers 200+ orgs including NATO and ENISA, yet only 75 of 6,000+ critical vulnerabilities have been patched.
Microsoft Build 2026's biggest signal: seven in-house MAI models, Project Polaris replacing GPT-4 Turbo in GitHub Copilot by August, a $9.69B Pentagon contract, and open-source agent frameworks rolling out.
Long-term care centers in Taiwan still rely on paper forms, Excel sheets, and LINE groups to handle daily operations. KotoCare is an MVP that actually runs — case management, AI query, CSV reports, electronic whiteboard, all backed by a real database.
NVIDIA unveiled the N1X at Computex 2026, its first ARM laptop SoC with 6,144 CUDA cores and 1,000 TOPS AI performance. Dell, Lenovo, and Asus are first movers in what could reshape the $200B PC market.
GitHub Copilot transitions to token-based AI Credits billing on June 1. Code completions stay free, but chat, agentic workflows, and code review now drain credits. One credit equals $0.01.
SoftBank commits up to €75 billion to build 5 GW of AI data centers in France. Phase 1 targets 3.1 GW in Hauts-de-France by 2031, with Schneider Electric and EDF as key partners.
Anthropic launches Claude Opus 4.8 just 41 days after Opus 4.7, with agentic coding scores up to 69.2%, fast mode pricing cut by two-thirds, and a new dynamic workflows feature running hundreds of parallel sub-agents. Mythos-class models will follow in weeks.
Dell Q1 FY2027: AI-optimized server revenue hits $16.1B (+757% YoY), total revenue $43.8B (+88%), stock surges 33% in best single day since its 2018 return to public markets.
Three months after talks of a $30B round, Anthropic closed $65B at $965B valuation, surpassing OpenAI's $730B and nearing the $1T mark.
Google DeepMind CEO Demis Hassabis updated his AGI timeline at Google I/O 2026: 2029 at the earliest, more than five years ahead of his forecast from a year ago. He says we're standing at the foothills of the singularity.
CNN filed its first AI copyright lawsuit against Perplexity, alleging the search startup scraped 17,000+ stories without authorization. The case could reshape how AI companies license news content.
AI coding startup Cognition raised $1B at a $26B valuation, with ARR surging 13x to $492M in 12 months. Its product Devin now writes 90% of the company's own code, with clients including Goldman Sachs, NASA, and Mercedes-Benz.
Illinois passed SB 315 110-0, requiring OpenAI, Anthropic, and Google DeepMind to undergo annual third-party safety audits. Both AI labs actually endorsed the bill. Governor Pritzker plans to sign.
Mistral AI announced industrial AI partnerships with Airbus and BMW today, applying specialized models to crash simulation and aircraft design. European data sovereignty is now a buying criterion, not a nice-to-have.
Beijing is now controlling when top AI talent at private firms like Alibaba and DeepSeek can leave the country. The US restricted chips; China is restricting people.
Jensen Huang told a Computex audience that Nvidia's annual spending in Taiwan has surged from $10-15 billion five years ago to $100 billion, heading toward $150 billion. Taiwan's Taiex closed at a record high the same day.
Q1 revenue $56B, net income $26.8B—yet 8,000 jobs cut. Meta isn't bleeding; it's converting headcount savings into compute budget, the biggest AI infrastructure bet in tech history.
NVIDIA posted Q1 FY2027 revenue of $81.6B, up 85% year-over-year, with data center nearly doubling. Jensen Huang declared 'Agentic AI has arrived' and unveiled Vera Rubin timelines. The stock dipped despite records, exposing the expectations trap every hypergrowth company eventually hits.
Pope Leo XIV released his first AI-focused encyclical on May 25, presenting alongside Anthropic co-founder Christopher Olah. The document warns of AI-driven dehumanization and calls human dignity the fundamental criterion for evaluating AI development.
DeepSeek has made its 75% V4-Pro API discount permanent, pricing output tokens at $0.87 per million, 34× cheaper than GPT-5.5. This isn't just a price cut; it's a direct attack on Western AI pricing power.
A malicious Nx Console VS Code extension stayed live for just 18 minutes, yet TeamPCP managed to steal 3,800 GitHub internal repos, compromise two OpenAI employee devices, and put Mistral's source code up for sale on dark web forums.
Anthropic releases Project Glasswing month-one results: Claude Mythos Preview found 10,000+ high/critical vulnerabilities across 1,000 open-source projects, with a 90.6% validation rate. The new bottleneck is patching, not discovery.
OpenAI commits $234M to Singapore for its first overseas Applied AI Lab. IMDA updates its agentic AI governance framework the same day.
OpenAI filed a confidential S-1 with the SEC on May 22, targeting a $1 trillion valuation with Goldman Sachs and Morgan Stanley underwriting. With $2B monthly revenue and 900M+ weekly users, this could be tech's largest-ever IPO.
Hours before the scheduled signing ceremony, Trump pulled an executive order that would have created a voluntary 90-day government AI model review. Musk and Zuckerberg opposed it overnight, and the White House backed down.
Anthropic's Q2 revenue is set to more than double to $10.9B, while a $1.25B/month SpaceX compute deal signals a massive infrastructure bet to power Claude's rapid growth.
OpenAI's general-purpose reasoning model disproved the Erdős unit distance conjecture — a problem open for 78 years — with no task-specific training. The proof was verified by Fields Medalist Tim Gowers and Princeton's Noga Alon.
Google I/O 2026's chart reveals one truth: AI usage isn't growing because people type more. 3.2 quadrillion tokens/month, 7x Y/Y growth — behind that number are automated pipelines running without pause. The question isn't whether you use AI, but whether AI is automatically working for you.
At Google I/O 2026, Google overhauled Search for the first time in 25 years and launched Gemini Spark, a 24/7 personal AI agent. With 900 million Gemini users, the AI race just shifted into a new gear.
OpenAI adopts C2PA and integrates Google's SynthID invisible watermark into ChatGPT images, plus a new public verification tool. Two AI rivals team up on deepfake detection, but does it actually work?
Anthropic acquired SDK automation startup Stainless for over $300M and will shut down its hosted services for all outside customers. OpenAI and Google both depended on it. This is a play for the connectivity layer of the agentic AI era.
Google I/O 2026 keynote unveils Gemini Intelligence embedded at the Android OS layer, a new Googlebook laptop category, and Samsung XR glasses. Google bets on distribution, not model rankings.
In banking, AI won't target judgment-heavy roles first. It's going after the relay chain—the people moving data from A to B. How much of your day is actually forwarding?
Cross-domain translation is becoming the defining competitive skill of the AI era — not because it sounds good, but because once AI takes over everything that requires only one language, what remains is all translation work.
Anthropic's unreleased Claude Mythos Preview has autonomously discovered thousands of zero-day vulnerabilities across major OSes and browsers. 12 tech giants joined as defenders, but the model was accessed without authorization on day one.
Google I/O 2026 opens tomorrow. Leaked Gemini Omni promises unified text/image/video generation — but can it catch Claude Mythos scoring 93.9% on SWE-bench?
Three months ago it was valued at $380B. Now it's closing a $30B round at $900B. Is Anthropic's latest fundraise rational market pricing, or the next AI bubble peak?
OpenAI co-founder Greg Brockman officially takes charge of product strategy, merging ChatGPT, Codex, and the developer API into one Agentic platform. A major reorganization timed four days before Google I/O.
Cerebras priced at $185, raised $5.55B, and surged 68% on debut to a $95B market cap. Its WSE-3 chip runs inference 15x faster than GPUs, with OpenAI and AWS already on board. A wave of AI IPOs is now on its way.
OpenAI partners with Plaid to let ChatGPT Pro users connect 12,000+ financial institutions. Spending analysis, portfolio tracking, and financial planning, but users are asking hard privacy questions.
Anthropic and the Gates Foundation commit $200 million over four years to deploy Claude in global health, education, and agriculture. Both sides have pledged to make results publicly verifiable.
The Trump-Xi summit unlocked H200 export licenses for Alibaba, Tencent, ByteDance and 7 others, but Beijing told them not to buy, and zero chips have shipped.
Court filings show OpenAI CEO Sam Altman holds over $2B in personal stakes across nine companies that did business with OpenAI, $1.7B in Helion alone. Closing arguments begin today in the landmark Musk lawsuit that could reshape AI's most powerful company.
Thinking Machines' TML-Interaction-Small hits 0.40s turn-taking latency — 3x faster than OpenAI — by scrapping the pipeline architecture entirely and letting the model learn interactivity at scale. Here's what that actually means.
In May 2026, Anthropic hosted Code with Claude 2026 across San Francisco, London, and Tokyo. The conference introduced no new foundation model, instead delivering a powerful combination of compute infrastructure, agent capabilities, developer tools, and cost optimization — signaling a decisive shift from benchmark competition toward real-world deployment.
This isn't a quiz about RAG or prompts — it asks: in real AI product scenarios, is your judgment ready? A 20-question skills check across 5 core dimensions to map your AI PM readiness.
Anthropic's Cat Wu describes a new PM rhythm in the AI era: roles merging, prototypes over docs, iteration in days not months. Reading it brought back memories of my own undefined role in an enterprise AI team—and Peter Deng's Avengers-style team philosophy.
TSMC's stock surged 137% from ~$164 in April 2025 to $387 in April 2026. This post breaks down how AI chip demand, CoWoS bottlenecks, and NVIDIA dethroning Apple as top customer drove the run.
The 2026 AI race is fundamentally about Harness engineering. This deep dive covers the 12 core modules of a production-grade Agent Harness, leading framework philosophies, and the 7 architectural decisions every AI architect must face.
GPT-5.5 launched April 23, topping 14 benchmarks and cutting token usage 40%. Behind the scenes, Jensen Huang and NVIDIA are betting up to $100B on the compute infrastructure that makes it run.
Ilya says compression is learning. Freedman finds only polynomial-growth monoids are compressible. If Persona can be projected onto a nilpotent substructure, PPV is not just a statistical fit — it's algebraically grounded personality compression.
On April 17, 2026, Anthropic launched Claude Design, a conversational AI visual design tool. Users simply describe what they need, and Claude generates interactive prototypes, slide decks, one-pagers, and more. Powered by Claude Opus 4.7, Anthropic's most capable vision model, the launch sent Figma's stock down 5% on the day.
Most AI Agents forget everything after each session. Hermes Agent is different — it remembers what you teach it and gets better over time. Here's what makes this open-source framework from NousResearch stand out.
Harness Engineering is the execution layer in AI Agent architecture. This post introduces the core design of a Harness: execution control, observability, hooks, tool sandboxing, and state management.
When AI researchers say LLMs are 'human-like,' which humans do they mean? A 2023 Harvard study used 262 cross-cultural survey variables and 94,278 respondents to show ChatGPT's cultural psychology aligns most closely with WEIRD Western democracies (r = -.70).
Can LLMs truly simulate 'you'? From Generative Agents to BehaviorChain, and the RAG-Free Psychometric Persona Vector (PPV) framework, this article compares three leading approaches to AI persona simulation.
Former Tesla AI Director Andrej Karpathy proposes replacing traditional RAG with an LLM-maintained personal Wiki. How does this three-layer architecture compound knowledge like interest? A complete breakdown.
Released April 2026 under Apache 2.0, Gemma 4 comes in four sizes — E2B, E4B, 26B MoE, and 31B Dense. The 31B ranks #3 among all open models globally with 256K context and native agentic workflows. A complete breakdown for AI developers.
In late March 2026, an accidental 59.8MB Source Map in Anthropic's npm release led to a full leak of Claude Code's underlying architecture. Beyond an engineering flaw, this is the first unboxing of enterprise-grade Agent frameworks, multi-layer prompting, and Undercover modes.
AI shopping agents are evolving from demos to real consumer tools. Walmart launched Sparky, Target partnered with Google Gemini, and Shopify released its Agentic Commerce protocol. When AI agents start swiping your card, e-commerce rules are being rewritten.
OpenClaw founder Peter Steinberger turned a weekend hack powered by Anthropic's Claude into a viral AI agent framework. After a trademark dispute forced a rebrand, OpenAI came knocking. In the Age of AI Exploration, even the smallest idea can reshape the world.
In his latest interview, Andrej Karpathy described experiencing 'AI Psychosis'—he hasn't written code himself in months. This article summarizes his core insights from the 'No Priors' podcast, including the concept of 'Claws' and the paradigm shift in software development.
2026 年 3 月 Google 推出了 Stitch 更新。這款由 Gemini 驅動的 AI 原生設計畫布不僅能透過自然語言生成 UI,更新增 Voice Canvas 語音編輯。它將如何徹底顛覆 Figma 與設計師的未來工作流?
OpenClaw showed us that an assistant is an always-on computing layer, not just a chatbot. But its variants (like NanoBot, CoPaw, IronClaw) are even more fascinating. Spanning five distinct paths, they outline the true shape of next-generation AI assistants.
AI Agents sound cool, but building Agent products in enterprise is full of pitfalls. Here are five design traps I've experienced firsthand.
When your boss asks 'Is AI worth the investment?', you need numbers. Here's the four-metric framework I use to prove GenAI value.
Enterprise prompt engineering is nothing like personal ChatGPT use. Structured templates, version control, multi-role design — lessons from the trenches.
Building a RAG system in banking: how to choose your chunk strategy, embedding model, and retrieval pipeline. Lessons from real production experience.
Does an AI PM need to code? A complete skill tree breakdown comparing AI PMs' skills, including Vibe Coding and specialized evaluation, versus traditional PMs.
Deploying AI in a bank isn't just picking a model. Compliance, security, data governance, organizational culture — each hurdle is the necessary path from 1...
Worried your AI feature will be rendered obsolete by the next model update? Learn how to anticipate model evolution and file AI patents to build an uncopyab...
No coding? Think again. The daily routine of an AI PM involves shifting from a traditional PM to a holistic 'Builder', testing prompts, and battling risk.
Same era, same job title — one group is being laid off while another is being hired. What separates them isn't seniority or credentials; it's how fast they'...
Over the past year leading a team, I found that people with genuine curiosity thrive in AI-augmented work. Boris Cherny of Anthropic thinks this gap is exac...
We're building an AI Agent platform that actually ships to real users — want to go from PoC to production? We're looking for full-stack, backend, and GenAI...
I built an AI Browser that records every reasoning step, tracks queries, auto-decides when to screenshot, and compiles everything into a structured investig...
There was a time when saying 'AI' in serious academic circles was a mark against you. Geoffrey Hinton won the Turing Award in 2018 and the Nobel Prize in Ph...
After Google pushed Gemini 3 Pro and Antigravity, I started rethinking the relationship between developers and AI infrastructure — and what 'role elevation'...
Perplexity is caught between being the next-generation search paradigm and facing mounting legal pressure from content publishers. Can they find a deal stru...
At DevFest Taipei 2025 I shared a real production AI coaching platform — multi-agent collaboration, Persona World, Ontology + GraphRAG, delivering 24/7 pers...
On November 30th I'll be presenting real AI Agent team applications running in production at DevFest Taipei 2025, hosted by Google GDG.
GraphRAG replaces flat vector retrieval with graph-structured knowledge, enabling multi-hop reasoning and consistent context — 86% accuracy on RobustQA vs....
All six utility model patents filed at the start of this year have been approved — two dual-filed. Another five submitted last month. This is what real GenA...
How does a bank GenAI Product Manager design an LLM system that automatically builds a knowledge graph from business pain points, and successfully obtain a...
I'll be speaking at DevFest Taipei 2025 on November 30th — AI Agent team applications in production. Free entry, registration required.
The clearest explanation of 'Attention Is All You Need' I've come across — the mechanics, not just the intuition.
Jason Wei's talk gave me a genuine 'aha' moment. The systematic framework he lays out for finding AI use cases is exactly what I wish I'd articulated earlier.
This October at the iThome Hello World Developer Conference, I presented four intensive sessions covering MCP, GraphRAG, Vibe Coding, and Enterprise LLM Gua...
Does setting temperature to 0 give perfectly consistent AI outputs? No — and Thinking Machine Lab found out why. Batch processing is the culprit, and they b...
When a GenAI system queries sensitive data, how do you prevent malicious users from bypassing security? This article details how a bank AI Product Manager d...
The vibe coding landscape has consolidated around three players — OpenAI's Codex, Google's Gemini, and Anthropic's Claude. Each pulls in a different direction.
Tested Gemini 2.5 Flash's image editing with three sequential prompts — suit, smile, tie adjustment. The precision was genuinely impressive.
We're taking financial AI to Southeast Asia and looking for two engineers. DevOps and full-stack roles open in Taipei Xinyi.
Former OpenAI VP of Product Peter Deng details the essence of product, 1-to-100 growth strategies, the five PM archetypes, and the value of invisible AI.
ChatGPT's agent renders its operations in real-time with full verbosity — like watching a capable human assistant work at a workstation on your behalf.
Are LLM deployment costs skyrocketing? This article shares how a bank GenAI Product Manager used modular architecture design to customize AI systems on dema...
Traditional DBAs manage databases on experience, but under high concurrency and complex loads, that's not enough. This article shares how a GenAI Product Ow...
When introducing an AI knowledge base query system in a bank, how do you prevent PII leaks without sacrificing response quality? This article introduces a G...
Stargate is a 24/7 round-the-clock server construction project. When Americans start running shifts like this, you know this is a race they don't intend to...
Relentless. Product after product, packed into 32 minutes. Google I/O felt less like a keynote and more like being underwater with no room to breathe.
Elon told Kobe: imagination matters more than knowledge. Frieren said magic is a world of imagination. To me, Transformers and generative AI are exactly tha...
What are the true pain points of Relationship Managers? How does GenAI help them generate real-time personalized investment advice in conversations? This ar...
Our financial AI team is looking for DevOps and data science professionals passionate about deploying generative AI applications in real production environm...
We challenged intern candidates to build a static website combining LLM and front-end skills in 60 minutes. The results changed how I think about what a hir...
This is the year of AI Agents. Join me at DevOpsDays on June 5–6 for a session on five agent behavior patterns and building the future DevOps ecosystem with AI.
OpenAI showcased four major innovations: Vision Fine-Tuning, Realtime API, Model Distillation, and Prompt Caching — handing more creative control to develop...