Weekly AI News #002 - Google I/O 2026 and the agent-first era
The second weekly AI news roundup. This week was almost entirely Google I/O 2026. From Gemini 3.5 Flash to Antigravity 2.0, Managed Agents, Gemini Spark, and Omni, everything pointed toward AI that runs as an agent, while supply chain attacks and trust questions grew right alongside it.
This is the second weekly AI news roundup. Looking back at the week of Monday, May 18 through Sunday, May 24, 2026, it was almost entirely a Google I/O 2026 week. And the direction was clear: AI is moving from something that answers questions to something that switches itself on and runs as an agent.
🔬 The bigger pattern this week
If last week was "AI moving into your work environment," this week went a step further. The announcements pouring out of Google I/O 2026 all pointed the same way: agent-first.
Models got faster and platforms were rebuilt around agents
Gemini 3.5 Flash, Antigravity 2.0, and the Managed Agents API all landed on the same day.
Agents are moving toward always-on
Gemini Spark is a personal assistant that keeps running in the cloud even after you close your laptop.
Trust and security grew heavier in parallel
the GitHub internal leak, a PHP ecosystem compromise, and the fight over SynthID watermarks. The stronger agents get, the more "what can you actually trust them with" matters.
Last week I wrote that the next edge comes from trust design. This week delivered exactly both sides of that: more powerful agents and proportionally larger risk.
🔒 Supply chain attacks again: GitHub leak and a PHP compromise
After last week's npm incident, this week it was GitHub itself and the PHP ecosystem.
GitHub internal repository leak
it started when a GitHub employee installed a malicious VS Code extension impersonating NX Console. About 3,800 internal repositories were accessed. Fortunately it was limited to GitHub's own internal code, with no impact reported on regular users or customer data. The method: disguise as a well-known extension to lure installation, then exfiltrate data via DNS tunneling.
LaravelLang admin takeover
the admin rights of the PHP localization library LaravelLang were hijacked, affecting 700+ repositories. The targets were secrets like AWS, GCP, and Azure access keys.
Last week it was npm package installs, this week VS Code extensions plus PHP packages. "Installing something" is increasingly the risk itself.
When one person runs several services like Park Labs, your dependencies and tool chain are the attack surface. The practical defense, tedious as it is: pause auto-updates, keep only extensions from trusted publishers, and rotate the API keys and tokens that get targeted most.
🤖 Google I/O ① models and the dev platform
Gemini 3.5 Flash
praised for multimodal work, especially video and chart analysis. CharXiv Reasoning 84.2% and MMMU-Pro 83.6%, with some analyses putting it ahead of Claude Opus 4.7 and GPT-5.5 on those items. But thinking tokens are billed at the output rate, and 3.5 Flash burns roughly 2x the average thinking tokens, so by Artificial Analysis's measure the real evaluation cost was $1,551, above 3.1 Pro. That raised doubts about its "Flash-like value." The 3.5 Pro due in June looks more interesting to me.
Antigravity 2.0
an agent-first development platform: a desktop app (parallel multi-agent execution and scheduled tasks), a CLI rewritten in Go, and an SDK for defining custom agents. The old Gemini CLI's Agent Skills, Hooks, and Subagents carry over as "Antigravity plugins." In the I/O live demo, 93 subagents ran in parallel processing 2.6B tokens to build an OS running for under $1,000, then ran DOOM on it; when it failed for a missing driver, it generated the driver on the spot. The abrupt UI change drew heavy criticism, so a patch will add an "Open IDE" button in the top right to switch back to the traditional IDE layout. And Gemini CLI shuts down on June 18, so migrating to the Antigravity CLI is mandatory.
Managed Agents API
the Gemini API gained a preview for spinning up autonomous agents in the cloud with a single call. One API call launches an agent in an isolated Linux sandbox that autonomously reasons, calls tools, runs code, browses the web, and manages files. You define the agent with markdown like AGENTS.md and SKILL.md, and it deploys with no separate orchestration code. In short, it is serverless for agent runtimes.
✨ Google I/O ② always-on agents and a world model
Gemini Spark
a personal AI agent that lives in the cloud 24/7. It keeps running in the background even when you close your laptop or lock your phone. Deeply tied into Workspace (Gmail, Docs, Slides), it can "tidy last week's email into a prioritized to-do every Monday at 9am," "spot hidden subscriptions in your card statement," or "log receipts to a spreadsheet automatically." High-risk actions like payments or sending email require confirmation first. It's in beta for US AI Ultra subscribers. It's essentially Google's take on the personal assistant I've been discussing via OpenClaw and Hermes Agent, now built on Google's own ecosystem.
Gemini Omni
not just a video generator but what Demis Hassabis called a world model: a system that understands the world and reasons about what happens next. It has an intuition for physics like gravity, kinetic energy, and fluid dynamics. The first model, Omni Flash, makes audio-synced 10-second clips from a single prompt and lets you revise style, camera angle, and background by chat. SynthID watermarks are inserted automatically. His line that "to get to AGI you have to become anything-to-anything" stuck with me.
Search becomes agentic
AI Mode's base model moved up to Gemini 3.5 Flash, and the search box was redesigned for the first time in 25 years. You can build agents directly inside search, and an "information agent" tracks the web 24/7 and notifies you. Personal Intelligence expanded to ~200 countries and 98 languages with no subscription required. The era of "type a question, get a list of links" really is ending.
Native Android apps in AI Studio
no installs; just a browser takes you from prompt to development, emulator preview, install on a real device, and deploy to a Play internal test track. With native access to hardware sensors like GPS, NFC, and camera, these are real native apps.
🛡️ Anthropic: a powerful model and its dilemma
Project Glasswing initial results
the first month of the security project announced in April was published. Using the unreleased frontier model Claude Mythos Preview, partners found 10,000+ high-risk and critical vulnerabilities. Cloudflare found ~2,000 (400 high-risk/critical), Mozilla found and fixed 271 in Firefox 150, and scanning 1,000+ open-source projects estimated 6,202 high-risk vulnerabilities. Partners include AWS, Apple, Google, Microsoft, NVIDIA, and JPMorganChase. Yet Anthropic itself says the safeguards against misuse aren't sufficient yet, so it isn't releasing the model publicly and limits it to trusted partners. As striking as 10,000 in a month is, what stands out is that the "dilemma of a too-powerful model" has become real.
Karpathy joins Anthropic
Andrej Karpathy, who coined "vibe coding," joined Anthropic's pre-training team. He's an OpenAI co-founder who led Tesla Autopilot. He plans to build a new team that uses Claude to accelerate pre-training research.
🧮 OpenAI: an AI disproves a math conjecture on its own
An 80-year-old Erdős conjecture, disproved
an OpenAI internal reasoning model disproved the long-standing belief that "the square grid is optimal" for the planar unit-distance problem (posed by Paul Erdős in 1946). Using algebraic number theory, it found an infinite family of constructions that polynomially beats the square grid, and an outside mathematician (Princeton's Will Sawin) verified and refined it. The key point is that a general reasoning model, not a math-specific one, came up with a new idea autonomously. It's being called the first case of "an AI actually doing math," and honestly I'm not sure whether to be thrilled or a little scared.
Codex updates
it's evolving from a "coding agent" into an "agent that drives your whole Mac." Appshots (double-tap Command to send the current app screen as context), Remote Computer Use that works even while locked, and the formal release of Goal Mode, which works autonomously for hours to days when given a goal. Codex has been getting noticeably better lately, so I'm running it alongside Claude Code and comparing.
💡 What I took away
The strongest thought this week: agent-first is no longer one company's experiment, it's becoming the industry default.
Google rebuilt everything around agents, from the model (Gemini 3.5) to dev tools (Antigravity), the API (Managed Agents), a personal assistant (Spark), and search. OpenAI's Codex started driving the whole Mac, and Anthropic's model found 10,000 vulnerabilities. The direction is the same: AI is moving past executing what people tell it, toward handling the environment itself and running autonomously for a long time.
So the questions for a solo builder shift too.
Powerful agents are clearly leverage for solo operations. But as this week's security incidents show, the bigger the leverage, the bigger the cost of a small mistake. Park Labs will use agents aggressively, but keep the boundaries of permission, cost, and trust narrow.
🔗 Sources
This issue is based on the official announcements and reporting below.
🎯 Next
Last week I said that going forward I'd dig deep into only 2-3 stories instead of treating all ten equally. This week Google I/O was so big that it naturally became two deep sections on I/O plus shorter notes on the rest.
Next week I want to slot in a hands-on piece, comparing the Antigravity CLI against Claude Code and Codex. The goal of this series isn't to stop at collecting news, but to record what I actually turn on and off in Park Labs operations.