AIAny - vision

Nano Banana

2025

Google DeepMind, Google AI Studio

Nano Banana (aka Gemini 2.5 Flash Image) is a state-of-the-art image generation and editing model from Google. It enables users to blend multiple images, maintain character consistency, perform targeted transformations via natural language, use world knowledge in image editing, and includes invisible SynthID watermarking to mark AI-generated or edited images.

ai-tools ai-image vision

Seedream

2025

ByteDance Seed

As a new-generation image creation model, Seedream 4.0 integrates image generation and image editing capabilities into a single, unified architecture. It supports multimodal inputs, reference images, and can produce high-definition images up to 4K with fast inference speed.

ai-tools ai-image vision

Veo

2024

Google DeepMind

Veo is a state-of-the-art video generation model developed by Google DeepMind, designed to empower filmmakers and storytellers.

ai-tools ai-video vision

FLUX.1

2024

Black Forest Labs

Amazing AI models from the Black Forest.

ai-tools ai-image vision

Midjourney

2022

Midjourney, Inc.

An independent research lab exploring new mediums of thought and expanding the imaginative powers of the human species.

ai-tools ai-image vision

Runway

2023

Runway AI, Inc.

With Runway Gen-4, you are now able to precisely generate consistent characters, locations and objects across scenes. Simply set your look and feel and the model will maintain coherent world environments while preserving the distinctive style, mood and cinematographic elements of each frame. Then, regenerate those elements from multiple perspectives and positions within your scenes.

ai-tools ai-video vision

KlingAI

2024

Kuaishou Technology

Kling AI, tools for creating imaginative images and videos, based on state-of-art generative AI methods.

ai-tools ai-image ai-video vision

Sora

2024

OpenAI

OpenAI's video generation model.Sora is able to generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background. The model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world.

ai-tools ai-video vision

ImageNet Classification with Deep Convolutional Neural Networks

2012

Alex Krizhevsky, Ilya Sutskever +1

The 2012 paper “ImageNet Classification with Deep Convolutional Neural Networks” by Krizhevsky, Sutskever, and Hinton introduced AlexNet, a deep CNN that dramatically improved image classification accuracy on ImageNet, halving the top-5 error rate from \~26% to \~15%. Its innovations — like ReLU activations, dropout, GPU training, and data augmentation — sparked the deep learning revolution, laying the foundation for modern computer vision and advancing AI across industries.

vision 30u30 paper foundation

Generative Adversarial Networks

2014

Ian J. Goodfellow, Jean Pouget-Abadie +6

The 2014 paper “Generative Adversarial Nets” (GAN) by Ian Goodfellow et al. introduced a groundbreaking framework where two neural networks — a generator and a discriminator — compete in a minimax game: the generator tries to produce realistic data, while the discriminator tries to distinguish real from fake. This approach avoids Markov chains and approximate inference, relying solely on backpropagation. GANs revolutionized generative modeling, enabling realistic image, text, and audio generation, sparking massive advances in AI creativity, deepfake technology, and research on adversarial training and robustness.

vision AIGC paper foundation

CS231n: Deep Learning for Computer Vision

2015

Fei-Fei Li

Stanford’s 10-week CS231n dives from first principles to state-of-the-art vision research, starting with image-classification basics, loss functions and optimization, then building from fully-connected nets to modern CNNs, residual and vision-transformer architectures. Lectures span training tricks, regularization, visualization, transfer learning, detection, segmentation, video, 3-D and generative models. Three hands-on PyTorch assignments guide students from k-NN/SVM through deep CNNs and network visualization, and a capstone project lets teams train large-scale models on a vision task of their choice, graduating with the skills to design, debug and deploy real-world deep-learning pipelines.

foundation vision 30u30 course tutorial

Multi-Scale Context Aggregation by Dilated Convolutions

2015

Fisher Yu, Vladlen Koltun

This paper introduces a novel module for semantic segmentation using dilated convolutions, which enables exponential expansion of the receptive field without losing resolution. By aggregating multi-scale contextual information efficiently, the proposed context module significantly improves dense prediction accuracy when integrated into existing architectures. The work has had a lasting impact on dense prediction and semantic segmentation, laying the foundation for many modern segmentation models.

30u30 paper vision

Tag

Explore by tags

Tag

Explore by tags

All

30u30

ASR

ChatGPT

GNN

IDE

RAG

ai-agent

ai-api

ai-api-management

ai-client

ai-coding

ai-demos

ai-development

ai-framework

ai-image

ai-image-demos

ai-inference

ai-leaderboard

ai-library

ai-rank

ai-serving

ai-tools

ai-train

ai-video

ai-workflow

AIGC

alibaba

amazon

anthropic

audio

blog

book

bytedance

chatbot

chemistry

claude

course

deepmind

deepseek

engineering

foundation

foundation-model

gemini

google

gradient-booting

grok

huggingface

LLM

math

mcp

mcp-client

mcp-server

meta-ai

microsoft

mlops

NLP

nvidia

openai

paper

physics

plugin

RL

science

sora

translation

tutorial

vibe-coding

video

vision

xAI

xai

Nano Banana

Seedream

Veo

FLUX.1