CV
General Information
Full Name | Chengsong Zhang |
Languages | Chinese, English |
Education
- 2023-2025
M.S., Computer Science
University of Illinois, Urbana-Champaign
- 2021 - 2023
B.S.E., Computer Science and Engineering
University of Michigan, Ann Arbor
- 2019 - 2023
B.S.E., Electrical and Computer Engineering
Shanghai Jiao Tong University
Academic Interests
-
Stable Diffusion
- Image generation and inpainting
- Controllable video generation
-
Machine Learning Systems
- Distributed training
- Database for machine learning
Stable Diffusion Open Source Projects
- 07/2023 - Now
AnimateDiff for Stable Diffusion WebUI
- AnimateDiff is the state-of-the-art open-sourced AI video generator. By plug in several motion modules to Stable Diffusion UNet at runtime, it turns any stable diffusion checkpoints to video generators.
- This extension is the most popular and the most easy-to-use user interface for open-sourced video generation. It has a clean implementation yet several powerful features. I decoupled AnimateDiff from diffusers to a plug-and-play extension within A1111 WebUI.
- With the help of A1111 LoRA system and me patching LoRA loader, motion LoRAs can be applied without affecting any other LoRA and LyCORIS models.
- By interpolating prompt conditions, users can achieve smooth scene transfer from one prompt to another.
- By re-writing ControlNet main entry, this extension can do video-to-video transfer with the help of ControlNet. It has proven strong performance within the domain of 3D-to-2D and video style transfer, when several ControlNets are applied with AnimateDiff.
- Attention optimizations including xformers and scaled dot products significantly improve speed and reduce VRAM by 3x. Native FP8 support let users run 1024x1024 high-res video-to-video transfer with only 18GB VRAM cost. Native LCM samplers let users generate reasonable videos within 8 steps.
- 04/2023 - Now
Segment Anything for Stable Diffusion WebUI
- This extension can automatically create bounding boxes and masks by clicking on images or entering text prompts in A1111 WebUI, both in single images and in batch, with the help of GroundingDINO (a powerful text-to-bounding-box model) and Segment Anything.
- It can automatically send masks to Stable Diffusion or ControlNet for inpainting.
- It can segment human or any other objects from source videos for - video style transfer with ControlNet and AnimateDiff - creating a better training dataset for LoRA or LyCORIS
- It can improve semantic segmentation and automatically send the semantic control map to ControlNet for retional-controlled image generation.
ML Systems Research Projects
- 08/2023 - Now
ddkang/aidb
Advised by Daniel Kang
- AIDB is a machine-learning analytics framework that can analyze unstructured data blazing fast with machine learning in a structured way.
- Integrate cloud inference API from OpenAI / HuggingFace / GoogleVision and local inference from PyTorch (GroundingDINO object detection) and Detectron2 (document segmentation and OCR).
- Investigate and experiment vector databases (Faiss / ChromaDB / Weaviate) for querying embeddings for approximate selection / aggregation.
- Design and implement Function-as-a-Service ML service, configuration schema and command line user interface. Implement several examples including NSFW detection and legal analysis.
- Improve querying speed via batching cached bound inference service.
- Design and implement downstream application \iconlink[\faGithub][query-your-video]{https://github.com/continue-revolution/query-your-video} for AIDB that can automatically chain ffmpeg frame extraction, GroundingDINO object detection, image classification, Segment Anything instance segmentation, WD14 image tagging via SQL queries, and convert WebUI inputs to SQL queries, to select frames containing desired objects from videos.
- 05/2022 - 02/2023
SymbioticLab/FedScale
Advised by Fan Lai and Mosharaf Chowdhury
- FedScale is a scalable and extensible open-source federated learning (FL) engine and benchmark.
- Design a distributed, hierarchical and serverless protocol to efficiently check-in clients and aggregate models
- Implement on-device training on various edge devices, such as clusters, PC and android. It supports a series of state-of-the-art execution frameworks, such as PyTorch, Alibaba MNN and TensorFlowLite.
Programming Skills
Languages | Python, C/C++, GoLang, CUDA, JavaScript, Java, OCaml, Rust, Dafny, R, Matlab |
Frameworks | PyTorch, TensorFlow, LLVM, Flask, React.js |
Teaching
- Spring 2024
- CS511 Advanced Data Management
- Summer 2021
- VV285 Honors Mathematics III
- VE280 Programming and Elementary Data Structures
- Spring 2021
- VV214 Linear Algebra
- Fall 2020
- VV186 Honors Mathematics II