Search
Collection
Category
Tag
Blog

AIAny

Tag

Explore by tags

AIAny

Learn Anything about AI in one site.

support@aiany.app

Product

Search
Collection
Category
Tag

Resources

Blog

Company

Privacy Policy
Terms of Service
Sitemap

All

30u30

ASR

ChatGPT

GNN

IDE

RAG

ai-agent

ai-api

ai-api-management

ai-client

ai-coding

ai-demos

ai-development

ai-framework

ai-image

ai-image-demos

ai-inference

ai-leaderboard

ai-library

ai-rank

ai-serving

ai-tools

ai-train

ai-video

ai-workflow

AIGC

alibaba

amazon

anthropic

audio

blog

book

bytedance

chatbot

chemistry

claude

course

deepmind

deepseek

engineering

foundation

foundation-model

gemini

github

google

gradient-booting

grok

huggingface

LLM

llm

math

mcp

mcp-client

mcp-server

meta-ai

microsoft

mlops

NLP

nvidia

ollama

openai

paper

physics

plugin

pytorch

RL

science

sora

translation

tutorial

vibe-coding

video

vision

xAI

xai

Order Matters Sequence to sequence for sets

2015

Oriol Vinyals, Samy Bengio +1

This paper explores how the order of inputs and outputs affects the performance of sequence-to-sequence (seq2seq) models, even when the data is unordered (e.g., sets). It introduces architectural extensions such as the Read-Process-Write model and proposes a training approach that searches over output permutations to improve learning. The paper shows that optimal ordering significantly impacts tasks like language modeling, parsing, and combinatorial problems. This work highlights the importance of considering input/output ordering in model design and has influenced further research in permutation-invariant architectures.

foundation 30u30 paper

Multi-Scale Context Aggregation by Dilated Convolutions

2015

Fisher Yu, Vladlen Koltun

This paper introduces a novel module for semantic segmentation using dilated convolutions, which enables exponential expansion of the receptive field without losing resolution. By aggregating multi-scale contextual information efficiently, the proposed context module significantly improves dense prediction accuracy when integrated into existing architectures. The work has had a lasting impact on dense prediction and semantic segmentation, laying the foundation for many modern segmentation models.

30u30 paper vision

Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

2015

Dario Amodei, Rishita Anubhai +32

This paper presents Deep Speech 2, an end-to-end deep learning system for automatic speech recognition that works across vastly different languages (English and Mandarin). It replaces traditional hand-engineered ASR pipelines with neural networks, achieving human-competitive transcription accuracy on standard datasets. The system uses HPC techniques for 7x speedup, enabling faster experimentation. Key innovations include Batch Normalization for RNNs, curriculum learning (SortaGrad), and GPU deployment optimization (Batch Dispatch). The approach demonstrates that end-to-end learning can handle diverse speech conditions including noise, accents, and different languages, representing a significant step toward universal speech recognition systems.

30u30 paper audio ASR

Deep Residual Learning for Image Recognition

2015

Kaiming He, Xiangyu Zhang +2

The paper “Deep Residual Learning for Image Recognition” (ResNet, 2015) introduced residual networks with shortcut connections, allowing very deep neural networks (over 100 layers) to be effectively trained by reformulating the learning task into residual functions (F(x) = H(x) − x). This innovation solved the degradation problem in deep models, achieving state-of-the-art results on ImageNet (winning ILSVRC 2015) and COCO challenges. Its impact reshaped the design of deep learning architectures across vision and non-vision tasks, becoming a foundational backbone in modern AI systems.

foundation 30u30 paper vision

Identity Mappings in Deep Residual Networks

2016

Kaiming He, Xiangyu Zhang +2

This paper shows that using identity mappings for skip connections and pre-activation in residual blocks allows signals to flow unimpeded, making it easier to train very deep networks. Through theoretical analysis and ablation studies, the authors introduce a pre-activation residual unit that enables successful training of 1000-layer ResNets and improves CIFAR-10/100 and ImageNet accuracy, influencing later architectures such as ResNet-v2 and numerous deep vision models.

foundation 30u30 paper vision

Pointer Networks

NaN

Oriol Vinyals, Meire Fortunato +1

We introduce a new neural architecture to learn the conditional probability of an output sequence with elements that are discrete tokens corresponding to positions in an input sequence. Such problems cannot be trivially addressed by existent approaches such as sequence-to-sequence and Neural Turing Machines, because the number of target classes in each step of the output depends on the length of the input, which is variable. Problems such as sorting variable sized sequences, and various combinatorial optimization problems belong to this class. Our model solves the problem of variable size output dictionaries using a recently proposed mechanism of neural attention. It differs from the previous attention attempts in that, instead of using attention to blend hidden units of an encoder to a context vector at each decoder step, it uses attention as a pointer to select a member of the input sequence as the output. We call this architecture a Pointer Net (Ptr-Net). We show Ptr-Nets can be used to learn approximate solutions to three challenging geometric problems -- finding planar convex hulls, computing Delaunay triangulations, and the planar Travelling Salesman Problem -- using training examples alone. Ptr-Nets not only improve over sequence-to-sequence with input attention, but also allow us to generalize to variable size output dictionaries. We show that the learnt models generalize beyond the maximum lengths they were trained on. We hope our results on these tasks will encourage a broader exploration of neural learning for discrete problems.

foundation 30u30 paper

Variational Lossy Autoencoder

2016

Xi Chen, Diederik P. Kingma +6

This paper proposes the Variational Lossy Autoencoder (VLAE), a VAE that uses autoregressive priors and decoders to deliberately discard local detail while retaining global structure. By limiting the receptive field of the PixelCNN decoder and employing autoregressive flows as the prior, the model forces the latent code to capture only high-level information, yielding controllable lossy representations. Experiments on MNIST, Omniglot, Caltech-101 Silhouettes and CIFAR-10 set new likelihood records for VAEs and demonstrate faithful global reconstructions with replaced textures. VLAE influenced research on representation bottlenecks, pixel-VAE hybrids, and state-of-the-art compression and generation benchmarks.

30u30 paper vision

Neural Message Passing for Quantum Chemistry

2017

Justin Gilmer, Samuel S. Schoenholz +3

This paper introduces Message Passing Neural Networks (MPNNs), a unifying framework for graph-based deep learning, and applies it to quantum-chemistry property prediction, achieving state-of-the-art accuracy on the QM9 benchmark and approaching chemical accuracy on most targets. Its impact includes popularising graph neural networks, influencing subsequent work in cheminformatics, materials discovery, and the broader machine-learning community by demonstrating how learned message passing can replace hand-engineered molecular descriptors.

foundation 30u30 paper science chemistry

A simple neural network module for relational reasoning

2017

Adam Santoro, David Raposo +5

This paper introduces Relation Networks, a plug-and-play neural module that explicitly computes pair-wise object relations. When appended to standard CNN/LSTM encoders the module yields super-human 95.5 % accuracy on CLEVR, solves 18/20 bAbI tasks, and infers hidden links in dynamic physical systems, inspiring later work on relational reasoning across vision, language and RL.

foundation 30u30 paper

Attention Is All You Need

2017

Ashish Vaswani, Noam Shazeer +6

The paper “Attention Is All You Need” (2017) introduced the Transformer — a novel neural architecture relying solely on self-attention, removing recurrence and convolutions. It revolutionized machine translation by dramatically improving training speed and translation quality (e.g., achieving 28.4 BLEU on English-German tasks), setting new state-of-the-art benchmarks. Its modular, parallelizable design opened the door to large-scale pretraining and fine-tuning, ultimately laying the foundation for modern large language models like BERT and GPT. This paper reshaped the landscape of NLP and deep learning, making attention-based models the dominant paradigm across many tasks.

NLP LLM AIGC 30u30 paper+1

Relational recurrent neural networks

2018

Adam Santoro, Ryan Faulkner +8

This paper introduces a Relational Memory Core that embeds multi-head dot-product attention into recurrent memory to enable explicit relational reasoning. Evaluated on synthetic distance-sorting, program execution, partially-observable reinforcement learning and large-scale language-modeling benchmarks, it consistently outperforms LSTM and memory-augmented baselines, setting state-of-the-art results on WikiText-103, Project Gutenberg and GigaWord. By letting memories interact rather than merely store information, the approach substantially boosts sequential relational reasoning and downstream task performance.

foundation 30u30 paper NLP LLM

GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism

2018

Yanping Huang, Youlong Cheng +9

This paper introduces GPipe, a model-parallelism library designed to train large neural networks efficiently using pipeline parallelism. It partitions models across accelerators, processes micro-batches in parallel, and supports synchronous gradient updates. GPipe enables near-linear scaling with the number of devices while maintaining model quality and training stability. It achieves state-of-the-art performance in large-scale image classification (AmoebaNet) and multilingual machine translation (6B parameter Transformer), demonstrating flexibility across tasks. Its impact lies in making massive model training more practical and accessible across diverse architectures without relying on high-speed interconnects or custom model designs.

foundation 30u30 paper engineering