terminal
howtonotcode.com
Skills logo

Skills

Term

Skills refer to the abilities to perform tasks effectively.

article 15 storys calendar_today First seen: 2026-02-12 update Last seen: 2026-03-03 open_in_new Website menu_book Wikipedia

Resources

Links to check for updates: homepage, feed, or git repo.

home Homepage

Stories

Showing 1-15 of 15

Endor Labs launches AURI: free security intelligence for AI coding agents

Endor Labs launched AURI, a free security intelligence layer for AI coding agents that scans code and dependencies for vulnerabilities, secrets, and malware and helps fix issues. [AURI by Endor Labs](https://www.endorlabs.com/learn/introducing-auri-security-intelligence-for-ai-coding-agents-and-developers) is now available to everyone, with its Skills plugin, MCP, and CLI offered free for developers. The tools let teams detect vulnerabilities and exposed secrets in first-party code and open source dependencies, block malware attacks, and fix security bugs. The pitch is to embed security into the architecture of agentic coding across editors, CI pipelines, and cloud environments, keeping pace with AI-written and AI-reviewed code. For backend and data teams trialing agents in the SDLC, [AURI](https://www.endorlabs.com/learn/introducing-auri-security-intelligence-for-ai-coding-agents-and-developers) offers a standard way to gate risky changes and automate remediation early in the pipeline.

calendar_today 2026-03-03
endor-labs auri auri-skills-plugin mcp auri-cli

AI IDEs go mainstream: vibe coding gains speed, but add guardrails

AI-first dev tools are pushing 'vibe coding' into production, but teams should add guardrails for model choice, verify Windows 11 25H2 compatibility, and stay ahead of IP risks. A detailed [Medium piece](https://medium.com/@designo038/ai-doesnt-need-your-figma-file-and-that-s-going-to-kill-your-job-96b9f834a162) argues tools like V0, Bolt, Lovable, Cursor, and Replit are already shipping full SaaS from prompts, citing aggressive adoption stats (e.g., 10M+ projects on Lovable, 90% of Fortune 100 using GitHub Copilot, 41% AI-written code in 2024) alongside real case studies. Operationally, Windsurf users can add repeatability with an [auto-model-switcher skill](https://lobehub.com/skills/karstenheld3-openai-backendtools-windsurf-auto-model-switcher) that screenshot-verifies the active model—useful for CI-style experiments and consistent comparisons across LLMs. Caveats are emerging: a [Stack Overflow thread](https://stackoverflow.com/questions/79899821/windsurf-and-antigravity-installers-freeze-on-extracting-files-after-upgrad) reports installer freezes for Windsurf/Antigravity after Windows 11 25H2, and an ABA newsletter flags IP pitfalls when blending AI-generated artifacts with human code in vibe coding workflows ([overview](https://www.americanbar.org/groups/intellectual_property_law/resources/newsletters/vibe-coding-intellectual-property/)).

calendar_today 2026-03-03
lovable windsurf github-copilot v0 bolt

AI coding stack converges (OpenSpec, ECC, Kiro) as CI-targeting npm worm raises guardrails stakes

AI coding tools are consolidating around config-as-code and multi-agent support (OpenSpec, ECC, AWS Kiro) while a new npm worm targeting CI and AI toolchains demands tighter supply-chain controls. OpenSpec’s latest release adds profile-based installs, auto-detection of existing AI tools, and first-class support for Pi and AWS Kiro, streamlining how teams standardize assistant skills across repos ([v1.2.0 notes](https://github.com/Fission-AI/OpenSpec/releases/tag/v1.2.0)). In parallel, Everything Claude Code’s “Codex Edition” unifies Claude Code, Cursor, OpenCode, and OpenAI Codex from a single config, ships 7 new repo-analysis skills, and bakes in AgentShield security tests, plus a GitHub app for org-wide rollout ([v1.6.0 notes](https://github.com/affaan-m/everything-claude-code/releases/tag/v1.6.0)). AWS is pushing Kiro’s agentic coding further to improve code quality ([DevOps.com](https://devops.com/aws-extends-agentic-ai-capabilities-of-kiro-developer-tool-to-improve-code-quality/)), with practitioners showing Kiro CLI working alongside Xcode MCP to ship an iOS app in hours—an example of assistant+IDE workflows entering the mainstream ([DEV post](https://dev.to/aws-heroes/i-promised-an-ios-app-kiro-cli-and-xcode-mcp-built-it-in-hours-519l)). Against this momentum, researchers warn of a new npm worm that can harvest secrets and weaponize CI while spreading via AI coding tools, reinforcing the need for deterministic builds, scoped tokens, and pre-commit/CI policy gates ([InfoWorld](https://www.infoworld.com/article/4136478/new-npm-worm-hits-ci-pipelines-and-ai-coding-tools.html)).

calendar_today 2026-02-24
openspec fission-ai everything-claude-code agentshield claude-code

OpenAI speeds up agent backends with Responses API WebSockets and gpt‑realtime‑1.5

OpenAI shipped a faster path for real-time, tool-calling agents by adding WebSockets to the Responses API and upgrading its voice model to gpt-realtime-1.5. OpenAI reports the new [gpt-realtime-1.5](https://the-decoder.com/openai-ships-api-upgrades-targeting-voice-reliability-and-agent-speed-for-developers/) improves number/letter transcription (~10%), logical audio tasks (~5%), and instruction following (~7%), while the Responses API now supports [WebSockets](https://the-decoder.com/openai-ships-api-upgrades-targeting-voice-reliability-and-agent-speed-for-developers/) so agents stream state and tool calls without resending full context, yielding a claimed 20–40% speedup on complex graphs. For productionization, OpenAI’s docs emphasize hardened patterns—capability encapsulation via [Skills](https://developers.openai.com/api/docs/guides/tools-skills/) and secure prompting/tooling per [Cybersecurity checks](https://developers.openai.com/api/docs/guides/safety-checks/cybersecurity)—while the cookbook on [long‑horizon Codex tasks](https://developers.openai.com/cookbook/examples/codex/long_horizon_tasks/) remains relevant for workflows that still need multi‑hour execution. Ecosystem notes: the Python SDK [v2.24.0](https://github.com/openai/openai-python/releases/tag/v2.24.0) adds a new API “phase” enum; community threads flag rough edges like fine‑tune inconsistencies between Chat vs. Responses with GPT‑4o, transient 401s on vector store creation, and disappearing service‑account keys (linkable via the OpenAI forum).

calendar_today 2026-02-24
openai gpt-realtime-15 responses-api realtime-api openai-python

Copilot CLI locks down MCP; Skills mature; watch VS Code and licensing gotchas

GitHub Copilot’s latest CLI releases tighten Model Context Protocol access and add workflow polish, while teams see editor and licensing edge cases worth planning for. Copilot CLI v0.0.416 adds enforcement to block third‑party MCP servers when policy disallows them and improves help, streaming counters, terminal status layout, and undo confirmations, while v0.0.415 brought agent model selection, a plan approval menu with curated actions, an env loader, a show_file tool, and quality fixes like UTF‑8 BOM handling and MCP UI polish ([0.0.416](https://github.com/github/copilot-cli/releases/tag/v0.0.416), [0.0.415](https://github.com/github/copilot-cli/releases/tag/v0.0.415), [all releases](https://github.com/github/copilot-cli/releases)). For security‑minded orgs, this pairs with growing scrutiny of what MCP unlocks inside enterprises, from querying internal systems to chaining multi‑step actions—governance and allowlists now matter in practice ([Scalekit’s analysis](https://www.scalekit.com/blog/github-copilot-mcp-enterprise-security-governance)). On the usability front, VS Code Insiders is iterating on a model picker with search, context‑window details, and contextual quick‑pick dialogs, while Copilot in VS Code is adding deeper C++/CMake awareness for richer assistance ([Insiders discussion](https://www.reddit.com/r/GithubCopilot/comments/1rct0g9/new_in_vs_code_insiders_model_picker_and/), [InfoWorld coverage](https://www.infoworld.com/article/4136164/microsoft-brings-c-plus-plus-smarts-to-github-copilot-in-visual-studio-code.html)). Teams should also track known rough edges like Copilot chat sessions not updating without reinstall and license entitlement desync between business and personal seats ([VS Code issue](https://github.com/microsoft/vscode/issues/297226), [GitHub community thread](https://github.com/orgs/community/discussions/187874)). For repeatable DevOps/SRE workflows, “Skills” provide on‑demand, reusable AI runbooks that load progressively and bundle scripts/templates, making it easier to standardize safe automation alongside MCP‑backed tools ([Skills walkthrough](https://dev.to/pwd9000/github-copilot-skills-reusable-ai-workflows-for-devops-and-sres-caf)).

calendar_today 2026-02-24
github-copilot copilot-cli github visual-studio-code microsoft

E2E perception + scaled data push real-time physical AI (YOLO26, EgoScale, Uni-Flow, AR1)

End-to-end perception and scaled human/simulation datasets are converging to deliver real-time, reasoning-capable models for robots and autonomous systems. [Ultralytics YOLO26](https://blog.dailydoseofds.com/p/researchers-solved-a-decade-old-problem) removes the Non-Maximum Suppression post-processing step via a dual-head design, producing one-box-per-object predictions in a single pass for faster, simpler, and more portable deployments (AGPL for research, enterprise licensing for commercial use). [NVIDIA/UCB/UMD’s EgoScale](https://quantumzeitgeist.com/robots-learn-skills-20-854-hours-human-video/) shows that 20,854 hours of egocentric, action-labeled video predictably improve a Vision-Language-Action model’s real-world dexterity and enable one-shot task adaptation, establishing large-scale human data as reusable supervision for manipulation. For long-horizon, fine-detail dynamics, [Uni-Flow](https://quantumzeitgeist.com/model-captures-complex-flows-long-timescales/) separates temporal rollout from spatial refinement to achieve faster-than-real-time flow inference, while NVIDIA’s [AlpamayoR1](https://towardsdatascience.com/alpamayor1-large-causal-reasoning-models-for-autonomous-driving/) integrates a VLM reasoning backbone for autonomous driving with reported 99ms latency on a single BlackWell GPU, highlighting on-device, reasoning-first E2E stacks.

calendar_today 2026-02-20
nvidia ultralytics ultralytics-yolo26 egoscale uni-flow

Outcome-centric AI testing and state-verified LLM outputs

Researchers and practitioners are converging on outcome-centric testing and verifiable state to make LLM systems more reliable and auditable in production. A new testing paradigm, reverse n-wise output testing, flips traditional input coverage to target coverage over behavioral outputs like calibration, fairness partitions, and distributional properties, promising stronger guarantees for AI/ML and even quantum systems; see the summary of this approach in [AI Testing Focuses On Outcomes, Not Inputs](https://quantumzeitgeist.com/ai-testing-focuses-outcomes-not-inputs/). In parallel, interpretability researchers urge rigorous causal-inference standards to avoid overstated claims and improve generalization of insights, outlined in [AI Insights Need Proof To Stay Reliable](https://quantumzeitgeist.com/ai-insights-need-proof-stay-reliable/). Complementing these, a community proposal on the OpenAI forum advocates a protocol layer for state-verified LLM outputs—think explicit, verifiable run state attached to responses—to improve traceability and trust; see [From Capability to Lucidity: Proposing a Protocol Layer for State-Verified LLM Output](https://community.openai.com/t/from-capability-to-lucidity-proposing-a-protocol-layer-for-state-verified-llm-output/1374578). Together, these ideas push AI in the SDLC toward testable behaviors, causal evidence, and auditable artifacts that backend and data teams can wire into CI/CD and governance.

calendar_today 2026-02-20
openai sap mlops cicd ai-testing

Windsurf ships new models, Linux ARM64, and enterprise hooks

Windsurf rolled out new frontier coding models, full Linux ARM64 support, and enterprise-grade Cascade Hooks while community feedback spotlights its transparent crediting versus rivals' opaque limits. Windsurf’s latest updates add Gemini 3.1 Pro, Claude Sonnet 4.6, GLM-5, Minimax M2.5, and GPT-5.3-Codex-Spark with time-limited credit multipliers, plus quality-of-life fixes and features like automatic Plan→Code switching, skills loading from .agents/skills, tracked rules in post_cascade_response, and diff zones auto-closing on commit; importantly, it now provides full Linux ARM64 deb/rpm packages and enterprise cloud config for Cascade Hooks with Devin service key auth, as detailed in the [Windsurf changelog](https://windsurf.com/changelog). A power user’s comparison underscores cost control and predictability: they favored Windsurf’s clear credit model over Cursor/Claude Code’s rate-limit surprises, keeping GitHub Copilot Pro+ for predictable premium requests while continuing to code primarily in Windsurf, per this [Reddit write-up](https://www.reddit.com/r/windsurf/comments/1r9b58e/i_almost_left_windsurf/).

calendar_today 2026-02-20
windsurf gemini-31-pro claude-sonnet-46 glm-5 minimax-m25

DeepMind’s delegation framework meets practical Agent Skills for safer, cheaper coding agents

DeepMind outlined a principled framework for safely delegating work across AI agents while developers show that SKILL.md-based agent skills and tooling make coding agents more efficient and dependable. Google DeepMind’s [Intelligent AI Delegation](https://arxiv.org/abs/2602.11865) proposes an adaptive task-allocation framework—covering role boundaries, transfer of authority, accountability, and trust—for delegating work across AI agents and humans, with explicit mechanisms for recovery from failures. On the ground, a hands-on walkthrough of Agent Skills shows how a SKILL.md plus progressive disclosure architecture can reduce context bloat and improve code consistency in tools like Claude Code, with clear patterns for discovery, on-demand instruction loading, and resource access ([guide](https://levelup.gitconnected.com/why-do-my-ai-agents-perform-better-than-yours-eb6a93369366)). For observability and reproducibility, Simon Willison adds [Chartroom and datasette-showboat](https://simonwillison.net/2026/Feb/17/chartroom-and-datasette-showboat/#atom-everything), a CLI-driven approach for agents to emit runnable Markdown artifacts that demonstrate code and data outputs—useful for audits, PR reviews, and postmortems.

calendar_today 2026-02-17
deepmind anthropic claude-code showboat agent-skills

Custom Copilot agents, IDE arenas, and terminal control planes

AI agent tooling for developers is maturing with customizable Copilot skills, IDE-based model comparisons, and terminal-first control planes, while new research warns multi-agent setups often hurt results. GitHub now documents how to tailor the Copilot CLI and coding agent with project-specific instructions, hooks, and skills, enabling targeted automation for repo chores, build/test flows, and shell tasks directly from your terminal or VS Code Insiders agent mode ([customize Copilot CLI](https://docs.github.com/en/copilot/how-tos/copilot-cli/customize-copilot), [create agent skills](https://docs.github.com/copilot/how-tos/use-copilot-agents/coding-agent/create-skills)). In parallel, IDE workflows are adding native model evaluation and task skills: Windsurf’s terminal and test-generation capabilities are backed by docs and guides, and its recent “Arena Mode” for side-by-side model comparisons surfaced in industry coverage ([terminal guide](https://docs.windsurf.ai/features/terminal), [AI command assistance](https://docs.windsurf.ai/cascade/terminal), [test generation](https://docs.windsurf.ai/features/test-generation), [InfoQ LLMs page](https://www.infoq.com/llms/news/)). Agent orchestration is shifting to the command line as well: Cline CLI 2.0 positions the terminal as an AI agent control plane for multi-file refactors and scripted operations ([DevOps.com](https://devops.com/cline-cli-2-0-turns-your-terminal-into-an-ai-agent-control-plane/)). But a new Google Research study summarized by InfoQ reports that scaling to multiple cooperating agents does not reliably improve outcomes and can reduce performance, so start with single-agent flows and measure before adding complexity ([InfoQ LLMs page](https://www.infoq.com/llms/news/)). Early experiments like xAI’s Grok Build with parallel agents and arena-style evaluation point to where this is heading, but details remain in flux ([TestingCatalog](https://www.testingcatalog.com/xai-tests-parralel-agents-and-arena-mode-for-grok-build/)).

calendar_today 2026-02-17
github-copilot github-copilot-cli visual-studio-code-insiders windsurf cascade

Anthropic’s Claude Code pushes into regulated enterprises as devs demand more agent transparency

Anthropic is expanding Claude Code from internal-heavy code generation to regulated enterprise use while shipping updates and fielding developer concerns about opaque agent behavior. Anthropic says its AI systems now generate nearly all of the company’s internal code, reframing engineers’ roles toward system design and review as described in this report from Moneycontrol ([source](https://www.moneycontrol.com/news/business/information-technology/why-anthropic-says-engineers-matter-more-than-ever-even-as-ai-writes-the-code-13830811.html)). Building on that, Anthropic announced a collaboration with Infosys to deliver agentic AI for telecom, financial services, and manufacturing via Infosys Topaz and the Claude Agent SDK, targeting persistent, multi-step workflows with governance needs ([announcement](https://www.anthropic.com/news/anthropic-infosys)). AWS also outlined how to run Claude Code in compliance-sensitive environments on Amazon Bedrock, aimed at aligning AI-assisted dev work with strict controls ([AWS blog](https://aws.amazon.com/blogs/machine-learning/supercharge-regulated-workloads-with-claude-code-and-amazon-bedrock/)). On the ground, developers called out visibility gaps around what agents do to their codebases in a widely discussed Hacker News thread ([discussion](https://news.ycombinator.com/item?id=47033622)), even as Anthropic continues frequent incremental fixes such as auth refresh repairs and improved error messaging in recent Claude Code releases ([release notes](https://github.com/anthropics/claude-code/releases)). Community demos show evolving workflows—like Plan Mode and multi-agent patterns in Opus 4.6—that promise more autonomous execution but heighten the need for auditability ([Plan Mode walkthrough](https://www.youtube.com/watch?v=fxj82iBWypA&pp=ygUSQ2xhdWRlIENvZGUgdXBkYXRl), [Agent Teams demo](https://www.youtube.com/watch?v=6UKUQNcRk2k&pp=ygUYQUkgY29kaW5nIGFnZW50IHdvcmtmbG93)).

calendar_today 2026-02-17
anthropic claude claude-code claude-agent-sdk infosys

Early signals on OpenAI Codex: agent workflows, throughput tips, and hype to filter

OpenAI's Codex is surfacing in community posts as an agent-oriented coding tool for building and running code, with early demos and throughput tips alongside hype about a 'GPT-5.3 Codex'. Builders are sharing hands-on experiences, including a zero-code 2D game built with Codex agent skills and CLI, which hints at agentic patterns and composable skills for programming tasks ([demo thread](https://community.openai.com/t/show-2d-game-built-using-codex-and-agent-skills-zero-code/1374319)). For heavier usage, a discussion on throughput scaling covers considerations for parallelism and high-volume AI builder workloads ([throughput thread](https://community.openai.com/t/codex-throughput-scaling-for-heavy-ai-builder-workloads/1374316)), and another thread explores orchestrating subagents for subtasks to mitigate model fatigue ([subagent thread](https://community.openai.com/t/model-fatigue-how-to-ask-codex-to-run-a-subagent-for-a-subtask/1374247)). Sentiment is mixed: an OpenAI community post voices strong skepticism about LLMs and Codex reliability ([skeptic thread](https://community.openai.com/t/codex-and-llms-in-general-are-a-big-fat-lie/1374390)), while viral chatter on Reddit and X touts a "GPT-5.3 Codex" replacing developers—claims that are unverified and likely overstated ([Reddit](https://www.reddit.com/r/AISEOInsider/comments/1r6c0zq/gpt53_codex_ai_coding_model_just_replaced_half_of/), [X post](https://x.com/elmd_/status/2023473911728611425)).

calendar_today 2026-02-17
openai codex gpt-53-codex agents code-generation

OpenAI Skills + Shell for long‑running agents: patterns and pitfalls

OpenAI’s new Skills and Shell tooling make it easier to ship capability‑scoped, long‑running agents for real backend work, but early adopters report reliability gaps you should engineer around. OpenAI’s cookbook shows how to turn discrete capabilities into reusable Skills that your agent invokes via tool calls, enabling least‑privilege execution and clearer observability ([Skills in API](https://developers.openai.com/cookbook/examples/skills_in_api/)); paired with the “tool‑call render” pattern, this turns a chatty bot into a doer with predictable handoffs ([render pattern explainer](https://dev.to/programmingcentral/the-tool-call-render-pattern-turning-your-ai-from-a-chatty-bot-into-a-doer-4cb2)). For workloads that run minutes to hours, OpenAI’s guidance combines Shell, Skills, and compaction to manage state bloat, retry long steps, and keep transcripts affordable and debuggable ([Shell + Skills + Compaction tips](https://developers.openai.com/blog/skills-shell-tips/)). Plan for rough edges reported by developers: an embedding outage returned all‑zero vectors in text‑embedding‑3‑small, some Assistants API file uploads expired immediately, GPT‑5.2 extended‑thinking had very low tokens/sec for some, and Apps SDK toolInvocation status UI required a widget workaround ([embedding outage](https://community.openai.com/t/embedding-model-outage-text-embedding-3-small-api-ev3-model-name-with-all-0-values/1374079#post_10), [files expiring](https://community.openai.com/t/files-instantly-expiring-upon-upload/1366339#post_5), [slow generation](https://community.openai.com/t/gpt-5-2-extended-thinking-webchat-has-unworkably-slow-token-4-tps-generation/1373185?page=3#post_49), [toolInvocation UI bug](https://community.openai.com/t/bug-meta-openai-toolinvocation-invoking-and-meta-openai-toolinvocation-invoked-not-shown-unless-the-tool-registers-a-widget/1374087#post_1)).

calendar_today 2026-02-12
openai chatgpt assistants-api agents-sdk chatgpt-apps-sdk