Tag

Explore by tags

All

30u30

ASR

ChatGPT

GNN

IDE

RAG

agent-skills

ai

ai-agent

ai-api

ai-api-management

ai-client

ai-coding

ai-demos

ai-deploy

ai-development

ai-framework

ai-image

ai-image-demos

ai-inference

ai-leaderboard

ai-library

ai-rank

ai-serving

ai-tools

ai-train

ai-video

ai-workflow

AIGC

algorithms

alibaba

amazon

android

anthropic

audio

aws

biology

blog

book

bytedance

chatbot

chatgpt

chemistry

claude

claude-code

cli

code

codex

copilot

course

cursor

deepmind

deepseek

depth

devops

diffusers

docker

drug-discovery

electron

embeddings

engineering

evaluation

facebook

finance

foundation

foundation-model

gemini

gemini-cli

gemma

genomics

gitHub

github

go

google

gradient-booting

grok

groq

huggingface

image

ios

java

javascript

json

LLM

llm

mLOps

math

mcp

mcp-client

mcp-server

meta-ai

meta-pytorch

microsoft

mlops

mobile

multilingual

multimodal

mysql

NLP

nlp

nodejs

nvidia

ocr

ollama

openai

opencode

pandas

paper

physics

pi

plugin

polars

postgres

privacy

prompt-engineering

pwa

python

pytorch

qwen

RL

robotics

rust

science

security

shodan

skillkit

sora

speech

sqlite

ssh

stt

swe

tensorrt

terminal

transformers

translation

tts

tutorial

typescript

vibe-coding

video

vision

vllm

voice

xAI

xai

DeepSeek-V4-Pro-DSpark

2026

DeepSeek-AI

Mixture-of-Experts LLM designed for million-token contexts, combining hybrid compressed attention, FP4/FP8 quantization-aware training for MoE experts, and multi-mode 'thinking' (Non-think/Think High/Think Max); includes a speculative-decoding extension for faster inference.

deepseek llm transformers huggingface ai-inference+2

Huihui-GLM-5.2-abliterated-GGUF

2026

huihui-ai, zai-org +1

An uncensored GGUF build of GLM-5.2 that applies weight “abliteration” to remove refusal filters and produce a locally runnable text-generation model; includes quantization conversions and shard-merge instructions, intended for experimental research rather than production use.

foundation-model llm transformers huggingface ai-inference+1

Qwopus-3.6-35B-A3B-Coder-MTP-GGUF

2026

Hugging Face, Alibaba Cloud (Qwen) +1

Jackrong

Thinking-off fine-tune for coding-agent workflows that prioritizes fast next-step decisions, lower token usage and stable multi-turn tool calling. Highlights: MoE 35B base, MTP speculative decoding, SWE-bench 62.4% (300 cases). Best for local agent loops and automated debug cycles; requires disciplined harnessing and schema consistency.

qwen llm ai-coding ai-agent multimodal+5

LongCat-2.0

2026

Meituan

A large-scale MoE language model for agentic coding and long-context tasks, natively supporting 1M-token context and dynamically activating tens of billions of parameters per token. Uses sparse attention and zero-computation experts to allocate compute per-token; model weights planned for release.

foundation-model llm ai-coding code agent-skills+2

ELDR: Expert-Locality-Aware Decode Routing for PD-Disaggregated MoE Serving

2026

KAISTDaejeonKorea, Microsoft ResearchBeijingChina +2

Sangjin Choi, Sukmin Cho +4

Predicts per-request MoE expert footprints from prefill activations and routes decode requests to workers that maximize expert-locality, lowering decode latency by combining offline K-means partitioning with online locality-band routing and a KV-block–coindexed signature cache.

vllm llm ai-serving ai-inference paper+2