Tag

Explore by tags

All

30u30

ASR

ChatGPT

GNN

IDE

RAG

agent-skills

ai

ai-agent

ai-api

ai-api-management

ai-client

ai-coding

ai-demos

ai-deploy

ai-development

ai-framework

ai-image

ai-image-demos

ai-inference

ai-leaderboard

ai-library

ai-rank

ai-serving

ai-tools

ai-train

ai-video

ai-workflow

AIGC

algorithms

alibaba

amazon

android

anthropic

audio

aws

biology

blog

book

bytedance

chatbot

chatgpt

chemistry

claude

claude-code

cli

code

codex

copilot

course

cursor

deepmind

deepseek

depth

devops

diffusers

docker

drug-discovery

electron

embeddings

engineering

evaluation

facebook

finance

foundation

foundation-model

gemini

gemini-cli

gemma

genomics

gitHub

github

go

google

gradient-booting

grok

groq

huggingface

image

ios

java

javascript

json

LLM

llm

mLOps

math

mcp

mcp-client

mcp-server

meta-ai

meta-pytorch

microsoft

mlops

mobile

multilingual

multimodal

mysql

NLP

nlp

nodejs

nvidia

ocr

ollama

openai

opencode

pandas

paper

physics

pi

plugin

polars

postgres

privacy

prompt-engineering

pwa

python

pytorch

qwen

RL

robotics

rust

science

security

shodan

skillkit

sora

speech

sqlite

ssh

stt

swe

tensorrt

terminal

transformers

translation

tts

tutorial

typescript

vibe-coding

video

vision

vllm

voice

xAI

xai

Step 3.7 Flash

2026

stepfun-ai

Processes images and text to produce structured, reasoning-rich text outputs for high-throughput agentic workflows. Sparse MoE design (198B total, ~11B active per token), 256k context window and selectable reasoning levels—optimized for single-pass parsing, verification, and multi-step automation.

multimodal llm transformers vllm ai-inference+4

COLLEAGUE.SKILL: Automated AI Skill Generation via Expert Knowledge Distillation

2026

Tianyi Zhou, Dongrui Liu +3

Automates distillation of heterogeneous traces from a target person or role into versioned, inspectable skill packages for LLM agents — producing separate capability and bounded-behavior tracks that support natural-language corrections, rollback, and cross-host installation. Ships with an open system and a skills gallery.

agent-skills skillkit LLM nlp paper+3

AutoMedBench: Towards Medical AutoResearch with Agentic AI Models

2026

Junqi Liu, Salena Song +13

Workflow-aware benchmark for autonomous medical-AI research that splits agent execution into five stages (Plan, Setup, Validate, Inference, Submit) and evaluates long-horizon runs across segmentation, image enhancement, VQA, report generation, and lesion detection with stage-level scoring.

vision multimodal ai-agent agent-skills ai-workflow+2

SearchSwarm: Towards Delegation Intelligence in Agentic LLMs for Long-Horizon Deep Research

2026

Pu Ning, Quan Chen +8

Guides LLM-based agents to decompose long-horizon research problems and delegate subtasks to constrained subagents, then fine-tunes models on harness-generated trajectories so delegation decisions become internalized. Reports SearchSwarm-30B-A3B achieving top BrowseComp scores for its scale.

ai-agent agent-skills llm paper ai-train+1

Toward Generalist Autonomous Research via Hypothesis-Tree Refinement

2026

Jiajie Jin, Yuyang Hu +16

Lets an AI agent propose, run, and evaluate multi-step research experiments using a persistent Hypothesis Tree that links hypotheses, artifacts, evidence, and distilled insights. Combines a long-lived coordinator with short-lived executors to carry lessons across time; evaluated on six ML tasks.

paper ai-agent agent-skills ai-workflow ai-train+2

AFTER

2026

Julia Belikova, Rauf Parchiev +5

Benchmark for evaluating procedural skill evolution in LLM agents: isolates reusable skill bodies, role-specific work surfaces, and hidden oracle assets to measure whether skill refinements transfer across tasks, roles, and model backbones. Includes 382 workplace tasks, 22 skills, and a controlled evaluation protocol.

evaluation agent-skills huggingface llm ai-agent+2