CV | Chengsong Zhang

General Information

Full Name	Chengsong Zhang
Languages	Chinese, English

Education

2023-2025

M.S., Computer Science

University of Illinois, Urbana-Champaign
2021 - 2023

B.S.E., Computer Science and Engineering

University of Michigan, Ann Arbor
2019 - 2023

B.S.E., Electrical and Computer Engineering

Shanghai Jiao Tong University

Academic Interests

Visual Generation
- Image Generation
- Video Generation
Systems for Machine Learning
- Distributed Training
- Parallel Serving

Stable Diffusion Open Source Projects

07/2023 - Now
AnimateDiff for Stable Diffusion WebUI
- AnimateDiff is the state-of-the-art open-sourced AI video generator. By plug in several motion modules to Stable Diffusion UNet at runtime, it turns any stable diffusion checkpoints to video generators.
- This extension is the most popular and the most easy-to-use user interface for open-sourced video generation. It has a clean implementation yet several powerful features. I decoupled AnimateDiff from diffusers to a plug-and-play extension within A1111 WebUI.
- With the help of A1111 LoRA system and me patching LoRA loader, motion LoRAs can be applied without affecting any other LoRA and LyCORIS models.
- By interpolating prompt conditions, users can achieve smooth scene transfer from one prompt to another.
- By re-writing ControlNet main entry, this extension can do video-to-video transfer with the help of ControlNet. It has proven strong performance within the domain of 3D-to-2D and video style transfer, when several ControlNets are applied with AnimateDiff.
- Attention optimizations including xformers and scaled dot products significantly improve speed and reduce VRAM by 3x. Native FP8 support let users run 1024x1024 high-res video-to-video transfer with only 18GB VRAM cost. Native LCM samplers let users generate reasonable videos within 8 steps.
04/2023 - Now
Segment Anything for Stable Diffusion WebUI
- This extension can automatically create bounding boxes and masks by clicking on images or entering text prompts in A1111 WebUI, both in single images and in batch, with the help of GroundingDINO (a powerful text-to-bounding-box model) and Segment Anything.
- It can automatically send masks to Stable Diffusion or ControlNet for inpainting.
- It can segment human or any other objects from source videos for - video style transfer with ControlNet and AnimateDiff - creating a better training dataset for LoRA or LyCORIS
- It can improve semantic segmentation and automatically send the semantic control map to ControlNet for retional-controlled image generation.

ML Systems Research Projects

08/2023 - Now
ddkang/aidb

Advised by Daniel Kang
- AIDB is a machine-learning analytics framework that can analyze unstructured data blazing fast with machine learning in a structured way.
- Integrate cloud inference API from OpenAI / HuggingFace / GoogleVision and local inference from PyTorch (GroundingDINO object detection) and Detectron2 (document segmentation and OCR).
- Investigate and experiment vector databases (Faiss / ChromaDB / Weaviate) for querying embeddings for approximate selection / aggregation.
- Design and implement Function-as-a-Service ML service, configuration schema and command line user interface. Implement several examples including NSFW detection and legal analysis.
- Improve querying speed via batching cached bound inference service.
- Design and implement downstream application \iconlink[\faGithub][query-your-video]{https://github.com/continue-revolution/query-your-video} for AIDB that can automatically chain ffmpeg frame extraction, GroundingDINO object detection, image classification, Segment Anything instance segmentation, WD14 image tagging via SQL queries, and convert WebUI inputs to SQL queries, to select frames containing desired objects from videos.
05/2022 - 02/2023
SymbioticLab/FedScale

Advised by Fan Lai and Mosharaf Chowdhury
- FedScale is a scalable and extensible open-source federated learning (FL) engine and benchmark.
- Design a distributed, hierarchical and serverless protocol to efficiently check-in clients and aggregate models
- Implement on-device training on various edge devices, such as clusters, PC and android. It supports a series of state-of-the-art execution frameworks, such as PyTorch, Alibaba MNN and TensorFlowLite.

Programming Skills

Languages	Python, C/C++, GoLang, CUDA, JavaScript, Java, OCaml, Rust, Dafny, R, Matlab
Frameworks	PyTorch, TensorFlow, LLVM, Flask, React.js

Teaching

Spring 2025
- CS598 Systems for Generative AI
Fall 2024
- CS598 Systems for Generative AI
Spring 2024
- CS511 Advanced Data Management
Summer 2021
- VV285 Honors Mathematics III
- VE280 Programming and Elementary Data Structures
Spring 2021
- VV214 Linear Algebra
Fall 2020
- VV186 Honors Mathematics II

General Information

Education

M.S., Computer Science

University of Illinois, Urbana-Champaign

B.S.E., Computer Science and Engineering

University of Michigan, Ann Arbor

B.S.E., Electrical and Computer Engineering

Shanghai Jiao Tong University

Academic Interests

Visual Generation

Systems for Machine Learning

Stable Diffusion Open Source Projects

AnimateDiff for Stable Diffusion WebUI

Segment Anything for Stable Diffusion WebUI

ML Systems Research Projects

ddkang/aidb

Advised by Daniel Kang

SymbioticLab/FedScale

Advised by Fan Lai and Mosharaf Chowdhury

Programming Skills

Teaching