AIAny - github

PDF Document Layout Analysis

2024

HURIDOCS

An open-source, Docker-ready microservice by HURIDOCS for intelligent PDF document layout analysis, OCR and content extraction. It supports high-accuracy visual models (VGT) and fast LightGBM models, identifies titles, text, tables, images and formulas, provides a Gradio web UI and a full REST API, and offers translation via Ollama and OCR via Tesseract (150+ languages).

ocr vision github huggingface ollama+1

Ant Design X

2024

Ant Design

Ant Design X is a UI and SDK toolkit from the Ant Design team for building AI-driven interfaces. It provides atomic interaction components, a streaming-friendly Markdown renderer, and an SDK to manage model/data streams, designed for integration with Ant Design ecosystems and enterprise AI applications.

github ai-tools ai-client ai-library AIGC+2

LMCache

2024

LMCache

LMCache is an open-source, high-performance KV (key-value) cache layer designed to accelerate LLM serving and inference, especially for long-context scenarios. By storing and reusing KV caches across GPU, CPU DRAM and local disk, and enabling cross-instance sharing, LMCache reduces time-to-first-token (TTFT) and GPU usage. It integrates tightly with vLLM, supports P2P cache sharing, non-prefix caches, multiple storage backends (CPU, disk, NIXL), and is distributed under Apache-2.0.

LLM llm ai-serving ai-inference ai-library+4

goose

2024

block

goose is a local, extensible, open source AI agent that automates complex engineering tasks from start to finish. Beyond code suggestions, it builds projects, writes and executes code, debugs, orchestrates workflows, and interacts with APIs autonomously. Compatible with any LLM, supports multi-model setups, integrates with MCP servers, and available as desktop app or CLI.

ai-agent ai-coding LLM mcp github+1

screenpipe

2024

mediar-ai

screenpipe is an open-source AI app store that powers desktop-native AI apps using continuous (24/7) local desktop history (screen + mic recording). It is 100% local, developer-friendly, supports a plugin system for building and publishing apps, and includes features like native OCR and indexed desktop history for richer AI context.

plugin github ai-tools ai-client ai-agent+3

TEN Framework

2024

TEN-framework

TEN Framework is an open-source framework for building real-time multimodal conversational voice AI agents. It supports low-latency, high-quality interactions with components like STT, LLM, and TTS, and includes extensible agent examples such as voice assistants, lip-sync avatars, and speech diarization. It integrates with services like Agora, OpenAI, and Deepgram, and its ecosystem features VAD, turn detection, and portal tools for real-time communication and hardware integration.

ai-framework ai-agent chatbot audio ASR+3

EXO

2024

exo-explore (GitHub), exo labs

EXO is an open-source project that connects everyday devices into a local AI cluster for running and accelerating large models. It features automatic device discovery, topology-aware automatic parallelism, tensor sharding, MLX-based distributed inference, and Day-0 support for RDMA over Thunderbolt to reduce inter-device latency.

github mlops ai-inference ai-serving ai-development+1

Hands-On Large Language Models

2024

Jay Alammar, Maarten Grootendorst +1

Official code repository for the O'Reilly book "Hands-On Large Language Models" by Jay Alammar and Maarten Grootendorst. It provides runnable notebooks, visual explanations, and practical examples across chapters covering tokens and embeddings, transformer internals, text classification, semantic search, fine-tuning, multimodal models, and more. Recommended to run in Google Colab for easy setup.

book llm LLM github tutorial+5

Foundations of LLMs (大模型基础)

2024

ZJU-LLMs

Foundations of LLMs is an open-source book by the ZJU-LLMs team that teaches fundamentals and advanced topics of large language models. It covers language model basics, LLM architecture evolution, prompt engineering, parameter-efficient fine-tuning, model editing, and retrieval-augmented generation. The repo provides chapter PDFs, paper lists, and is updated monthly.

book foundation LLM llm NLP+3

Polymarket Agents

2024

Polymarket

Polymarket Agents is an open-source developer framework and utilities for building AI agents that trade autonomously on Polymarket. The repository integrates with the Polymarket API, provides LLM tooling and RAG support, and includes data connectors for news, web search, and betting data to enable prediction-market oriented agents. Distributed under an MIT license with usage and jurisdictional restrictions noted in the README.

ai-agent RAG github ai-development ai-api+1

CosyVoice (Fun-CosyVoice)

2024

FunAudioLLM

CosyVoice (Fun-CosyVoice) is a multilingual, LLM-based text-to-speech (TTS) system that provides end-to-end capabilities for training, inference and deployment. It focuses on zero-shot voice cloning, strong content consistency, speaker similarity and natural prosody, supports many languages and Chinese dialects, pronunciation inpainting, text normalization, and low-latency bi-streaming for production use.

audio LLM huggingface pytorch ai-inference+2

MinerU

2024

OpenDataLab

MinerU is an open-source tool that transforms complex documents like PDFs into LLM-ready markdown or JSON formats. It excels in layout analysis, text recognition, formula and table parsing, removing headers/footers, supporting multilingual OCR, converting formulas to LaTeX, and tables to HTML, ideal for AI agentic workflows.

github ai-tools llm ai-development ai-library

Tag

Explore by tags

All

30u30

ASR

ChatGPT

GNN

IDE

RAG

ai-agent

ai-api

ai-api-management

ai-client

ai-coding

ai-demos

ai-development

ai-framework

ai-image

ai-image-demos

ai-inference

ai-leaderboard

ai-library

ai-rank

ai-serving

ai-tools

ai-train

ai-video

ai-workflow

AIGC

alibaba

amazon

anthropic

audio

blog

book

bytedance

chatbot

chemistry

claude

course

deepmind

deepseek

engineering

foundation

foundation-model

gemini

github

google

gradient-booting

grok

huggingface

LLM

llm

math

mcp

mcp-client

mcp-server

meta-ai

microsoft

mlops

NLP

nvidia

ocr

ollama

openai

paper

physics

plugin

pytorch

RL

science

sora

translation

tutorial

vibe-coding

video

vision

xAI

xai

PDF Document Layout Analysis