LogoAIAny
  • Search
  • Collection
  • Category
  • Tag
  • Blog
LogoAIAny

Tag

Explore by tags

LogoAIAny

Learn Anything about AI in one site.

support@aiany.app
Product
  • Search
  • Collection
  • Category
  • Tag
Resources
  • Blog
Company
  • Privacy Policy
  • Terms of Service
  • Sitemap
Copyright © 2026 All Rights Reserved.
  • All

  • 30u30

  • ASR

  • ChatGPT

  • GNN

  • IDE

  • RAG

  • ai-agent

  • ai-api

  • ai-api-management

  • ai-client

  • ai-coding

  • ai-demos

  • ai-development

  • ai-framework

  • ai-image

  • ai-image-demos

  • ai-inference

  • ai-leaderboard

  • ai-library

  • ai-rank

  • ai-serving

  • ai-tools

  • ai-train

  • ai-video

  • ai-workflow

  • AIGC

  • alibaba

  • amazon

  • anthropic

  • audio

  • blog

  • book

  • bytedance

  • chatbot

  • chemistry

  • claude

  • claude-code

  • course

  • deepmind

  • deepseek

  • engineering

  • foundation

  • foundation-model

  • gemini

  • github

  • google

  • gradient-booting

  • grok

  • huggingface

  • LLM

  • llm

  • math

  • mcp

  • mcp-client

  • mcp-server

  • meta-ai

  • microsoft

  • mlops

  • NLP

  • nvidia

  • ocr

  • ollama

  • openai

  • paper

  • physics

  • plugin

  • pytorch

  • RL

  • science

  • sora

  • translation

  • tutorial

  • vibe-coding

  • video

  • vision

  • xAI

  • xai

Icon for item

NeuTTS

2025
Neuphonic

NeuTTS is an open-source collection of on-device TTS models by Neuphonic, designed for real-time, high-quality voice synthesis and instant speaker cloning. It targets deployment on phones, laptops and embedded devices using GGML/GGUF formats, small LLM backbones, and a compact neural audio codec (NeuCodec). Outputs include watermarking for responsibility and models are available on Hugging Face.

audiogithubllmai-toolsai-inference

Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

2015
Dario Amodei, Rishita Anubhai +32

This paper presents Deep Speech 2, an end-to-end deep learning system for automatic speech recognition that works across vastly different languages (English and Mandarin). It replaces traditional hand-engineered ASR pipelines with neural networks, achieving human-competitive transcription accuracy on standard datasets. The system uses HPC techniques for 7x speedup, enabling faster experimentation. Key innovations include Batch Normalization for RNNs, curriculum learning (SortaGrad), and GPU deployment optimization (Batch Dispatch). The approach demonstrates that end-to-end learning can handle diverse speech conditions including noise, accents, and different languages, representing a significant step toward universal speech recognition systems.

30u30paperaudioASR
  • Previous
  • 1
  • 2
  • Next