LogoAIAny
  • Search
  • Collection
  • Category
  • Tag
  • Blog
LogoAIAny

Tag

Explore by tags

LogoAIAny

Learn Anything about AI in one site.

support@aiany.app
Product
  • Search
  • Collection
  • Category
  • Tag
Resources
  • Blog
Company
  • Privacy Policy
  • Terms of Service
  • Sitemap
Copyright © 2025 All Rights Reserved.
  • All

  • 30u30

  • ASR

  • ChatGPT

  • GNN

  • IDE

  • RAG

  • ai-agent

  • ai-api

  • ai-api-management

  • ai-client

  • ai-coding

  • ai-demos

  • ai-development

  • ai-framework

  • ai-image

  • ai-image-demos

  • ai-inference

  • ai-leaderboard

  • ai-library

  • ai-rank

  • ai-serving

  • ai-tools

  • ai-train

  • ai-video

  • ai-workflow

  • AIGC

  • alibaba

  • amazon

  • anthropic

  • audio

  • blog

  • book

  • bytedance

  • chatbot

  • chemistry

  • claude

  • course

  • deepmind

  • deepseek

  • engineering

  • foundation

  • foundation-model

  • gemini

  • github

  • google

  • gradient-booting

  • grok

  • huggingface

  • LLM

  • llm

  • math

  • mcp

  • mcp-client

  • mcp-server

  • meta-ai

  • microsoft

  • mlops

  • NLP

  • nvidia

  • ollama

  • openai

  • paper

  • physics

  • plugin

  • pytorch

  • RL

  • science

  • sora

  • translation

  • tutorial

  • vibe-coding

  • video

  • vision

  • xAI

  • xai

Order Matters Sequence to sequence for sets

2015
Oriol Vinyals, Samy Bengio +1

This paper explores how the order of inputs and outputs affects the performance of sequence-to-sequence (seq2seq) models, even when the data is unordered (e.g., sets). It introduces architectural extensions such as the Read-Process-Write model and proposes a training approach that searches over output permutations to improve learning. The paper shows that optimal ordering significantly impacts tasks like language modeling, parsing, and combinatorial problems. This work highlights the importance of considering input/output ordering in model design and has influenced further research in permutation-invariant architectures.

foundation30u30paper

Multi-Scale Context Aggregation by Dilated Convolutions

2015
Fisher Yu, Vladlen Koltun

This paper introduces a novel module for semantic segmentation using dilated convolutions, which enables exponential expansion of the receptive field without losing resolution. By aggregating multi-scale contextual information efficiently, the proposed context module significantly improves dense prediction accuracy when integrated into existing architectures. The work has had a lasting impact on dense prediction and semantic segmentation, laying the foundation for many modern segmentation models.

30u30papervision

Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

2015
Dario Amodei, Rishita Anubhai +32

This paper presents Deep Speech 2, an end-to-end deep learning system for automatic speech recognition that works across vastly different languages (English and Mandarin). It replaces traditional hand-engineered ASR pipelines with neural networks, achieving human-competitive transcription accuracy on standard datasets. The system uses HPC techniques for 7x speedup, enabling faster experimentation. Key innovations include Batch Normalization for RNNs, curriculum learning (SortaGrad), and GPU deployment optimization (Batch Dispatch). The approach demonstrates that end-to-end learning can handle diverse speech conditions including noise, accents, and different languages, representing a significant step toward universal speech recognition systems.

30u30paperaudioASR

Deep Residual Learning for Image Recognition

2015
Kaiming He, Xiangyu Zhang +2

The paper “Deep Residual Learning for Image Recognition” (ResNet, 2015) introduced residual networks with shortcut connections, allowing very deep neural networks (over 100 layers) to be effectively trained by reformulating the learning task into residual functions (F(x) = H(x) − x). This innovation solved the degradation problem in deep models, achieving state-of-the-art results on ImageNet (winning ILSVRC 2015) and COCO challenges. Its impact reshaped the design of deep learning architectures across vision and non-vision tasks, becoming a foundational backbone in modern AI systems.

foundation30u30papervision

Identity Mappings in Deep Residual Networks

2016
Kaiming He, Xiangyu Zhang +2

This paper shows that using identity mappings for skip connections and pre-activation in residual blocks allows signals to flow unimpeded, making it easier to train very deep networks. Through theoretical analysis and ablation studies, the authors introduce a pre-activation residual unit that enables successful training of 1000-layer ResNets and improves CIFAR-10/100 and ImageNet accuracy, influencing later architectures such as ResNet-v2 and numerous deep vision models.

foundation30u30papervision

Pointer Networks

NaN
Oriol Vinyals, Meire Fortunato +1

We introduce a new neural architecture to learn the conditional probability of an output sequence with elements that are discrete tokens corresponding to positions in an input sequence. Such problems cannot be trivially addressed by existent approaches such as sequence-to-sequence and Neural Turing Machines, because the number of target classes in each step of the output depends on the length of the input, which is variable. Problems such as sorting variable sized sequences, and various combinatorial optimization problems belong to this class. Our model solves the problem of variable size output dictionaries using a recently proposed mechanism of neural attention. It differs from the previous attention attempts in that, instead of using attention to blend hidden units of an encoder to a context vector at each decoder step, it uses attention as a pointer to select a member of the input sequence as the output. We call this architecture a Pointer Net (Ptr-Net). We show Ptr-Nets can be used to learn approximate solutions to three challenging geometric problems -- finding planar convex hulls, computing Delaunay triangulations, and the planar Travelling Salesman Problem -- using training examples alone. Ptr-Nets not only improve over sequence-to-sequence with input attention, but also allow us to generalize to variable size output dictionaries. We show that the learnt models generalize beyond the maximum lengths they were trained on. We hope our results on these tasks will encourage a broader exploration of neural learning for discrete problems.

foundation30u30paper

Variational Lossy Autoencoder

2016
Xi Chen, Diederik P. Kingma +6

This paper proposes the Variational Lossy Autoencoder (VLAE), a VAE that uses autoregressive priors and decoders to deliberately discard local detail while retaining global structure. By limiting the receptive field of the PixelCNN decoder and employing autoregressive flows as the prior, the model forces the latent code to capture only high-level information, yielding controllable lossy representations. Experiments on MNIST, Omniglot, Caltech-101 Silhouettes and CIFAR-10 set new likelihood records for VAEs and demonstrate faithful global reconstructions with replaced textures. VLAE influenced research on representation bottlenecks, pixel-VAE hybrids, and state-of-the-art compression and generation benchmarks.

30u30papervision

Neural Message Passing for Quantum Chemistry

2017
Justin Gilmer, Samuel S. Schoenholz +3

This paper introduces Message Passing Neural Networks (MPNNs), a unifying framework for graph-based deep learning, and applies it to quantum-chemistry property prediction, achieving state-of-the-art accuracy on the QM9 benchmark and approaching chemical accuracy on most targets. Its impact includes popularising graph neural networks, influencing subsequent work in cheminformatics, materials discovery, and the broader machine-learning community by demonstrating how learned message passing can replace hand-engineered molecular descriptors.

foundation30u30papersciencechemistry

A simple neural network module for relational reasoning

2017
Adam Santoro, David Raposo +5

This paper introduces Relation Networks, a plug-and-play neural module that explicitly computes pair-wise object relations. When appended to standard CNN/LSTM encoders the module yields super-human 95.5 % accuracy on CLEVR, solves 18/20 bAbI tasks, and infers hidden links in dynamic physical systems, inspiring later work on relational reasoning across vision, language and RL.

foundation30u30paper

Attention Is All You Need

2017
Ashish Vaswani, Noam Shazeer +6

The paper “Attention Is All You Need” (2017) introduced the Transformer — a novel neural architecture relying solely on self-attention, removing recurrence and convolutions. It revolutionized machine translation by dramatically improving training speed and translation quality (e.g., achieving 28.4 BLEU on English-German tasks), setting new state-of-the-art benchmarks. Its modular, parallelizable design opened the door to large-scale pretraining and fine-tuning, ultimately laying the foundation for modern large language models like BERT and GPT. This paper reshaped the landscape of NLP and deep learning, making attention-based models the dominant paradigm across many tasks.

NLPLLMAIGC30u30paper+1

Relational recurrent neural networks

2018
Adam Santoro, Ryan Faulkner +8

This paper introduces a Relational Memory Core that embeds multi-head dot-product attention into recurrent memory to enable explicit relational reasoning. Evaluated on synthetic distance-sorting, program execution, partially-observable reinforcement learning and large-scale language-modeling benchmarks, it consistently outperforms LSTM and memory-augmented baselines, setting state-of-the-art results on WikiText-103, Project Gutenberg and GigaWord. By letting memories interact rather than merely store information, the approach substantially boosts sequential relational reasoning and downstream task performance.

foundation30u30paperNLPLLM

GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism

2018
Yanping Huang, Youlong Cheng +9

This paper introduces GPipe, a model-parallelism library designed to train large neural networks efficiently using pipeline parallelism. It partitions models across accelerators, processes micro-batches in parallel, and supports synchronous gradient updates. GPipe enables near-linear scaling with the number of devices while maintaining model quality and training stability. It achieves state-of-the-art performance in large-scale image classification (AmoebaNet) and multilingual machine translation (6B parameter Transformer), demonstrating flexibility across tasks. Its impact lies in making massive model training more practical and accessible across diverse architectures without relying on high-speed interconnects or custom model designs.

foundation30u30paperengineering
  • Previous
  • 1
  • 2
  • 3
  • Next