TRELLIS.2-4B: The Complete Guide to Microsoft's Revolutionary 3D Generation Model (2025)

🎯 Core Highlights (TL;DR)

TRELLIS.2-4B is Microsoft's state-of-the-art 4 billion parameter model for high-fidelity image-to-3D generation
Introduces O-Voxel (Omni-Voxel), a breakthrough "field-free" representation handling arbitrary topologies including open surfaces and non-manifold geometry
Achieves ultra-fast generation: 3 seconds for 512³ resolution, 17 seconds for 1024³ on NVIDIA H100
Supports full PBR materials (Base Color, Metallic, Roughness, Opacity) for photorealistic rendering
Compresses 1024³ assets into only ~9.6K latent tokens with negligible quality loss
Open-source under MIT License with complete training code and 500K dataset

What is TRELLIS.2-4B?
Evolution from TRELLIS to TRELLIS.2
Technical Innovations
Key Features and Capabilities
Performance Benchmarks
Installation and Setup
How to Use TRELLIS.2-4B
Comparison with Other 3D Generation Models
Training Your Own Model
Limitations and Considerations
Frequently Asked Questions
Conclusion and Next Steps

What is TRELLIS.2-4B?

TRELLIS.2-4B is Microsoft Research's latest breakthrough in 3D generative AI, representing a significant leap forward in image-to-3D conversion technology. As a 4 billion parameter model, it transforms single 2D images into fully textured, high-resolution 3D assets with unprecedented quality and speed.

Core Capabilities

Input: Single RGB image
Output: Fully textured 3D mesh with PBR materials
Resolution: Supports 512³ to 1536³ voxel grid resolution
Speed: 3-60 seconds depending on resolution (NVIDIA H100)
License: MIT License (open-source)

💡 Key Innovation
Unlike traditional methods that rely on implicit fields (SDF, NeRF) or iso-surface representations (Flexicubes), TRELLIS.2 uses a novel "field-free" approach that natively handles complex geometries without lossy conversions.

Research Background

Developed by a collaborative team from Tsinghua University and Microsoft Research, TRELLIS.2 builds upon the original TRELLIS model (CVPR'25 Spotlight) with fundamental architectural improvements. The research paper is available at arXiv:2512.14692.

Evolution from TRELLIS to TRELLIS.2

TRELLIS (First Generation)

The original TRELLIS introduced the concept of Structured LATent (SLAT) representation, enabling:

Multiple output formats (Radiance Fields, 3D Gaussians, Meshes)
Models up to 2B parameters
Training on 500K diverse 3D objects

Model	Parameters	Key Feature
TRELLIS-image-large	1.2B	Image-to-3D generation
TRELLIS-text-base	342M	Text-to-3D (base)
TRELLIS-text-large	1.1B	Text-to-3D (large)
TRELLIS-text-xlarge	2.0B	Text-to-3D (extra-large)

TRELLIS.2 Breakthrough

TRELLIS.2 represents a paradigm shift with:

✅ Native topology handling - No conversion artifacts
✅ Compact latent space - 16× spatial compression
✅ Instant processing - Rendering-free, optimization-free
✅ Full PBR support - Including transparency/translucency
✅ Higher resolution - Up to 1536³ voxel grids

Technical Innovations

1. O-Voxel: Omni-Voxel Representation

O-Voxel is the cornerstone innovation of TRELLIS.2, representing a "field-free" sparse voxel structure that simultaneously encodes geometry and appearance.

Geometry Component (f_shape)

Flexible Dual Grids: Handles arbitrary topologies
Sharp Edge Preservation: Maintains geometric details
Topology Freedom: Supports open surfaces, non-manifold geometry, internal structures

Appearance Component (f_mat)

Base Color: RGB texture information
Metallic: Material reflectivity
Roughness: Surface smoothness
Alpha: Transparency/translucency support

⚠️ Technical Advantage
Traditional iso-surface methods (SDF, Flexicubes) struggle with:

Open surfaces (e.g., cloth, hair)

Non-manifold geometry (e.g., intersecting surfaces)

Internal structures (e.g., hollow objects)

O-Voxel handles all these cases natively without conversion artifacts.

2. SC-VAE: Sparse Compression VAE

The Sparse Compression 3D VAE employs a Sparse Residual Autoencoding scheme to achieve unprecedented compression ratios.

Resolution	Latent Tokens	Compression Ratio
512³	~2.4K	64× spatial
1024³	~9.6K	16× spatial
1536³	~21.6K	7× spatial

Key Features:

Negligible perceptual degradation
Efficient large-scale generative modeling
Direct voxel compression without intermediate representations

3. Flow-Matching Transformer Architecture

TRELLIS.2-4B utilizes vanilla DiT (Diffusion Transformer) architecture with:

4 billion parameters
Flow-matching training objective
Efficient attention mechanisms for sparse data
Multi-resolution training strategy

4. Instant Bidirectional Conversion

One of TRELLIS.2's most practical innovations is the ability to convert between meshes and O-Voxels instantly:

Direction	Time (Single CPU)	Time (CUDA)
Mesh → O-Voxel	< 10 seconds	< 100ms
O-Voxel → Mesh	< 10 seconds	< 100ms

This enables:

Rendering-free processing: No need for multi-view rendering
Optimization-free workflow: Direct conversion without iterative refinement
Minimalist pipeline: Simplified data preparation and post-processing

Key Features and Capabilities

High Quality and Resolution

TRELLIS.2-4B generates assets with exceptional fidelity across multiple resolutions:

📊 Generation Quality Metrics

Resolution: 512³
- Generation Time: 3 seconds (2s shape + 1s material)
- Detail Level: High
- Use Case: Rapid prototyping, real-time applications

Resolution: 1024³
- Generation Time: 17 seconds (10s shape + 7s material)
- Detail Level: Very High
- Use Case: Production assets, game development

Resolution: 1536³
- Generation Time: 60 seconds (35s shape + 25s material)
- Detail Level: Ultra High
- Use Case: Film production, high-end visualization

Arbitrary Topology Handling

Unlike traditional methods constrained by iso-surface representations, TRELLIS.2 robustly handles:

✔ Open Surfaces

Cloth, curtains, flags
Hair and fur
Thin structures

✔ Non-manifold Geometry

Intersecting surfaces
Self-intersections
Complex architectural elements

✔ Internal Structures

Hollow objects
Multi-layer constructions
Enclosed cavities

Rich Texture Modeling with PBR

Full Physically Based Rendering (PBR) support enables photorealistic relighting:

Material Property	Description	Use Case
Base Color	RGB albedo texture	Surface appearance
Metallic	Metal vs. dielectric	Material type classification
Roughness	Surface smoothness	Specular reflection control
Opacity (Alpha)	Transparency level	Glass, water, translucent materials

✅ Best Practice
The PBR material system is compatible with standard game engines (Unity, Unreal Engine) and 3D software (Blender, Maya), enabling seamless integration into production pipelines.

Shape-Conditioned Texture Generation

TRELLIS.2 supports two generation modes:

Image-to-3D: Generate complete 3D asset from single image
Texture Generation: Generate textures for existing 3D meshes with reference image

This flexibility allows:

Re-texturing existing assets
Style transfer to 3D models
Texture variation generation

Performance Benchmarks

Generation Speed (NVIDIA H100)

Resolution	Shape Generation	Material Generation	Total Time
512³	2 seconds	1 second	3 seconds
1024³	10 seconds	7 seconds	17 seconds
1536³	35 seconds	25 seconds	60 seconds

Latent Space Efficiency

TRELLIS.2 achieves state-of-the-art compression while maintaining quality:

Reconstruction Accuracy vs. Latent Compactness

TRELLIS.2: ████████████████████ (Highest)
Method A:  ████████████░░░░░░░░
Method B:  ██████████░░░░░░░░░░
Method C:  ████████░░░░░░░░░░░░

Compactness (tokens per 1024³):
TRELLIS.2: ~9.6K
Method A:  ~25K
Method B:  ~40K
Method C:  ~60K

Hardware Requirements

Component	Minimum	Recommended
GPU Memory	24GB	48GB+
GPU Model	NVIDIA A100	NVIDIA H100
System RAM	32GB	64GB+
CUDA Version	12.4+	12.4+
OS	Linux	Linux (Ubuntu 20.04+)

Installation and Setup

Prerequisites

Before installing TRELLIS.2, ensure your system meets these requirements:

Operating System: Linux (tested on Ubuntu 20.04+)
GPU: NVIDIA GPU with 24GB+ VRAM
CUDA Toolkit: Version 12.4 or higher
Python: Version 3.8 or higher
Conda: For dependency management

⚠️ Windows Users
While primarily tested on Linux, Windows setup is possible but not officially supported. Refer to community discussions for Windows-specific configurations.

Step-by-Step Installation

1. Clone the Repository

git clone --recurse-submodules https://github.com/microsoft/TRELLIS.2.git
cd TRELLIS.2

2. Create Conda Environment

# Create new environment
conda create -n trellis2 python=3.10
conda activate trellis2

3. Install Dependencies

The installation script provides modular dependency installation:

# Install all dependencies for inference
. ./setup.sh --new-env --basic --xformers --flash-attn \
  --diffoctreerast --spconv --mipgaussian --kaolin --nvdiffrast

Installation Flags Explained:

Flag	Purpose
`--new-env`	Create new conda environment named 'trellis2'
`--basic`	Install core dependencies
`--xformers`	Memory-efficient attention (for GPUs without flash-attn)
`--flash-attn`	Fast attention implementation (recommended)
`--diffoctreerast`	Differentiable octree rasterizer
`--spconv`	Sparse convolution operations
`--mipgaussian`	Mip-splatting for Gaussian rendering
`--kaolin`	NVIDIA's 3D deep learning library
`--nvdiffrast`	Differentiable rasterizer

4. Environment Configuration

Set environment variables for optimal performance:

export OPENCV_IO_ENABLE_OPENEXR=1
export PYTORCH_CUDA_ALLOC_CONF="expandable_segments:True"

# For GPUs without flash-attn support (e.g., V100)
# export ATTN_BACKEND=xformers

# SPCONV algorithm selection
export SPCONV_ALGO=native  # Use 'auto' for benchmarking (slower first run)

5. Download Pre-trained Models

Models are automatically downloaded from Hugging Face on first use, or download manually:

# Models will be cached in ~/.cache/huggingface/
# No manual download required for basic usage

Troubleshooting Installation

Issue: CUDA version mismatch

# Check CUDA version
nvcc --version

# Set correct CUDA path
export PATH=/usr/local/cuda-12.4/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-12.4/lib64:$LD_LIBRARY_PATH

Issue: Out of memory during compilation

# Limit parallel compilation jobs
export MAX_JOBS=4

How to Use TRELLIS.2-4B

Basic Image-to-3D Generation

Here's a minimal example to generate a 3D asset from an image:

import os
os.environ['OPENCV_IO_ENABLE_OPENEXR'] = '1'
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"

import cv2
import imageio
from PIL import Image
import torch
from trellis2.pipelines import Trellis2ImageTo3DPipeline
from trellis2.utils import render_utils
from trellis2.renderers import EnvMap
import o_voxel

# 1. Setup Environment Map for PBR rendering
envmap = EnvMap(torch.tensor(
    cv2.cvtColor(cv2.imread('assets/hdri/forest.exr', cv2.IMREAD_UNCHANGED), 
                 cv2.COLOR_BGR2RGB),
    dtype=torch.float32, device='cuda'
))

# 2. Load Pipeline
pipeline = Trellis2ImageTo3DPipeline.from_pretrained("microsoft/TRELLIS.2-4B")
pipeline.cuda()

# 3. Load Image & Run Generation
image = Image.open("assets/example_image/T.png")
mesh = pipeline.run(image)[0]

# 4. Simplify mesh (nvdiffrast has 16M triangle limit)
mesh.simplify(16777216)

# 5. Render Video Preview
video = render_utils.make_pbr_vis_frames(
    render_utils.render_video(mesh, envmap=envmap)
)
imageio.mimsave("output.mp4", video, fps=15)

# 6. Export to GLB format
glb = o_voxel.postprocess.to_glb(
    vertices            = mesh.vertices,
    faces               = mesh.faces,
    attr_volume         = mesh.attrs,
    coords              = mesh.coords,
    attr_layout         = mesh.layout,
    voxel_size          = mesh.voxel_size,
    aabb                = [[-0.5, -0.5, -0.5], [0.5, 0.5, 0.5]],
    decimation_target   = 1000000,
    texture_size        = 4096,
    remesh              = True,
    remesh_band         = 1,
    remesh_project      = 0,
    verbose             = True
)
glb.export("output.glb", extension_webp=True)

Advanced Usage: Multi-Resolution Generation

Generate assets at different resolutions based on your needs:

# High-speed generation (512³)
mesh_fast = pipeline.run(
    image,
    resolution=512,
    seed=42
)[0]

# Balanced quality (1024³) - Default
mesh_balanced = pipeline.run(
    image,
    resolution=1024,
    seed=42
)[0]

# Maximum quality (1536³)
mesh_ultra = pipeline.run(
    image,
    resolution=1536,
    seed=42
)[0]

Shape-Conditioned Texture Generation

Generate textures for existing 3D meshes:

from trellis2.pipelines import Trellis2TextureGenerationPipeline

# Load texture generation pipeline
texture_pipeline = Trellis2TextureGenerationPipeline.from_pretrained(
    "microsoft/TRELLIS.2-4B"
)
texture_pipeline.cuda()

# Load existing mesh and reference image
input_mesh = o_voxel.io.load_mesh("input_model.obj")
reference_image = Image.open("texture_reference.png")

# Generate texture
textured_mesh = texture_pipeline.run(
    mesh=input_mesh,
    image=reference_image,
    seed=42
)[0]

Batch Processing

Process multiple images efficiently:

import glob
from pathlib import Path

# Load pipeline once
pipeline = Trellis2ImageTo3DPipeline.from_pretrained("microsoft/TRELLIS.2-4B")
pipeline.cuda()

# Process all images in directory
image_paths = glob.glob("input_images/*.png")

for img_path in image_paths:
    image = Image.open(img_path)
    mesh = pipeline.run(image)[0]
    
    # Save with same filename
    output_name = Path(img_path).stem
    glb = o_voxel.postprocess.to_glb(
        vertices=mesh.vertices,
        faces=mesh.faces,
        attr_volume=mesh.attrs,
        coords=mesh.coords,
        attr_layout=mesh.layout,
        voxel_size=mesh.voxel_size,
        aabb=[[-0.5, -0.5, -0.5], [0.5, 0.5, 0.5]],
        decimation_target=1000000,
        texture_size=4096
    )
    glb.export(f"output/{output_name}.glb", extension_webp=True)

Output Formats

TRELLIS.2 supports multiple output formats:

Format	Extension	Use Case
GLB	.glb	Web, game engines, general 3D software
PLY	.ply	Point cloud, Gaussian splatting
OBJ	.obj	Traditional 3D modeling software
GLTF	.gltf	Web applications, AR/VR

# Export to different formats
mesh.save_ply("output.ply")  # Gaussian representation
mesh.save_obj("output.obj")  # Traditional mesh
glb.export("output.glb")     # Optimized for web/games

Comparison with Other 3D Generation Models

TRELLIS.2 vs. Original TRELLIS

Feature	TRELLIS (v1)	TRELLIS.2
Representation	SLAT (Structured Latent)	O-Voxel (Omni-Voxel)
Topology Support	Limited (iso-surface based)	Arbitrary (field-free)
Max Resolution	1024³	1536³
Latent Tokens (1024³)	~25K	~9.6K
PBR Materials	Partial	Full (including alpha)
Processing Pipeline	Multi-stage rendering	Instant conversion
Open Surfaces	❌	✅
Non-manifold Geometry	❌	✅
Internal Structures	❌	✅

TRELLIS.2 vs. Other State-of-the-Art Models

Model	Parameters	Speed (1024³)	Topology	PBR Support
TRELLIS.2-4B	4B	17s	Arbitrary	Full
Shap-E	300M	~30s	Limited	Partial
Point-E	1B	~45s	Limited	No
DreamFusion	-	~2 hours	Limited	Partial
Magic3D	-	~40 min	Limited	Partial
Instant3D	2B	~25s	Limited	Partial

✅ Competitive Advantage
TRELLIS.2's combination of speed, quality, and topology flexibility makes it the most versatile solution for production-ready 3D asset generation.

When to Use TRELLIS.2

Best Use Cases:

Production asset creation for games and films
Rapid prototyping and concept visualization
E-commerce 3D product visualization
AR/VR content creation
Architectural visualization
Digital twin creation

Consider Alternatives When:

You need text-only input (use TRELLIS-text models)
You require real-time generation on mobile devices
You need extremely high polygon counts (>10M triangles)
You're working with specific artistic styles (may need fine-tuning)

Training Your Own Model

TRELLIS.2 provides complete training code for researchers and developers who want to:

Fine-tune on custom datasets
Experiment with architecture modifications
Train domain-specific models

Training Dataset: TRELLIS-500K

Microsoft provides TRELLIS-500K, a curated dataset containing 500,000 high-quality 3D assets from:

Source	Assets	Description
Objaverse(XL)	~350K	Diverse everyday objects
ABO	~50K	Amazon product catalog
3D-FUTURE	~40K	Furniture and interior design
HSSD	~40K	Habitat synthetic scenes
Toys4k	~20K	Toy objects

All assets are filtered based on aesthetic scores and quality metrics.

Training Pipeline Overview

The training process follows a multi-stage approach:

Stage 1: VAE Training
├── Sparse Structure VAE (ss_vae)
└── SLat VAE with Decoders (slat_vae)
    ├── Gaussian Decoder
    ├── Radiance Field Decoder
    └── Mesh Decoder

Stage 2: Flow Model Training
├── Sparse Structure Flow (ss_flow)
└── SLat Flow (slat_flow)

Training Configuration

Example configurations are provided in the configs/ directory:

VAE Training:

# Train Sparse Structure VAE
python train.py \
  --config configs/vae/ss_vae_conv3d_16l8_fp16.json \
  --output_dir outputs/ss_vae \
  --data_dir /path/to/TRELLIS-500K \
  --num_gpus 8

# Train SLat VAE with Gaussian Decoder
python train.py \
  --config configs/vae/slat_vae_enc_dec_gs_swin8_B_64l8_fp16.json \
  --output_dir outputs/slat_vae_gs \
  --data_dir /path/to/TRELLIS-500K \
  --num_gpus 8

Flow Model Training:

# Train Image-conditioned Flow Model
python train.py \
  --config configs/generation/slat_flow_img_dit_L_64l8p2_fp16.json \
  --output_dir outputs/slat_flow_img \
  --data_dir /path/to/TRELLIS-500K \
  --num_nodes 4 \
  --num_gpus 8 \
  --master_addr $MASTER_ADDR \
  --master_port $MASTER_PORT

Multi-Node Distributed Training

For large-scale training across multiple machines:

# Node 0 (Master)
python train.py \
  --config configs/generation/slat_flow_img_dit_L_64l8p2_fp16.json \
  --output_dir outputs/slat_flow_img_distributed \
  --data_dir /path/to/TRELLIS-500K \
  --num_nodes 4 \
  --node_rank 0 \
  --num_gpus 8 \
  --master_addr 192.168.1.100 \
  --master_port 29500

# Node 1
python train.py \
  --config configs/generation/slat_flow_img_dit_L_64l8p2_fp16.json \
  --output_dir outputs/slat_flow_img_distributed \
  --data_dir /path/to/TRELLIS-500K \
  --num_nodes 4 \
  --node_rank 1 \
  --num_gpus 8 \
  --master_addr 192.168.1.100 \
  --master_port 29500

# Repeat for nodes 2 and 3...

Fine-tuning on Custom Data

To fine-tune TRELLIS.2 on your own dataset:

Prepare Data: Convert your 3D assets to O-Voxel format
Configure Training: Modify config files for your dataset
Resume from Checkpoint: Load pre-trained weights

python train.py \
  --config configs/generation/slat_flow_img_dit_L_64l8p2_fp16.json \
  --output_dir outputs/custom_finetuned \
  --data_dir /path/to/custom/dataset \
  --load_dir microsoft/TRELLIS.2-4B \
  --num_gpus 8

Training Hardware Requirements

Model Size	Recommended GPUs	Training Time (500K dataset)
Base (342M)	8× A100 (40GB)	~1 week
Large (1.1B)	16× A100 (40GB)	~2 weeks
XLarge (4B)	32× A100 (80GB)	~4 weeks

Limitations and Considerations

Known Limitations

1. Geometric Artifacts

Issue: Generated meshes may occasionally contain small holes or minor topological discontinuities.

Impact:

Affects applications requiring watertight geometry (3D printing, simulation)
More common in high-complexity models with intricate details

Mitigation:

# Use provided post-processing scripts
from trellis2.utils import mesh_repair

cleaned_mesh = mesh_repair.fill_holes(mesh, max_hole_size=100)
cleaned_mesh = mesh_repair.remove_degenerate_faces(cleaned_mesh)

2. Base Model Without Alignment

Issue: TRELLIS.2-4B is a pre-trained foundation model without human preference alignment (RLHF).

Impact:

Output style reflects training data distribution
May require multiple generations to achieve desired aesthetic
Not optimized for specific artistic styles

Recommendations:

Generate multiple variants with different seeds
Use post-processing for style refinement
Consider fine-tuning for specific aesthetic requirements

3. Input Image Quality Dependency

Issue: Output quality heavily depends on input image characteristics.

Best Practices:

Use high-resolution images (512×512 minimum, 1024×1024 recommended)
Ensure clear object visibility with minimal occlusion
Prefer images with good lighting and contrast
Avoid heavily compressed or noisy images

4. Memory Requirements

Issue: High-resolution generation requires substantial GPU memory.

Resolution	Minimum VRAM	Recommended VRAM
512³	16GB	24GB
1024³	24GB	40GB
1536³	40GB	80GB

Memory Optimization:

# Enable memory-efficient settings
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"

# Use gradient checkpointing during inference
pipeline.enable_memory_efficient_attention()

Responsible AI Considerations

⚠️ Important Notice
TRELLIS.2 is a research project. Responsible AI considerations were factored into all stages:

Dataset Curation: Public datasets reviewed for harmful content and PII

Potential Bias: Internet-sourced data may contain inherent biases

Intended Use: Academic and research purposes only

Commercial Use: Requires careful evaluation of generated content

Ethical Guidelines:

Do not generate content that infringes intellectual property rights
Avoid creating misleading or deceptive 3D representations
Respect privacy and consent when generating assets based on real objects
Consider cultural sensitivity in generated content

Performance Considerations

Factors Affecting Generation Quality:

Input Image Characteristics
- Resolution and clarity
- Lighting conditions
- Object visibility and occlusion
- Background complexity
Generation Parameters
- Resolution setting (512³ vs 1024³ vs 1536³)
- Random seed selection
- Sampling steps (if configurable)
Hardware Configuration
- GPU model and memory
- CUDA version compatibility
- Driver version

Frequently Asked Questions

Q: What is the difference between TRELLIS and TRELLIS.2?

A: TRELLIS.2 represents a fundamental architectural upgrade from the original TRELLIS. The key differences are:

Representation: TRELLIS uses SLAT (Structured Latent), while TRELLIS.2 uses O-Voxel (Omni-Voxel), a "field-free" approach
Topology: TRELLIS.2 natively handles arbitrary topologies including open surfaces and non-manifold geometry, which TRELLIS cannot
Efficiency: TRELLIS.2 compresses 1024³ assets into ~9.6K tokens vs. ~25K in TRELLIS
Materials: TRELLIS.2 supports full PBR including transparency, while TRELLIS has partial support
Processing: TRELLIS.2 offers instant mesh conversion (<100ms with CUDA) vs. multi-stage rendering in TRELLIS

Q: Can I use TRELLIS.2 for commercial projects?

A: Yes, TRELLIS.2 is released under the MIT License, which permits commercial use. However:

Verify that generated assets don't infringe on existing intellectual property
The model is a base model without alignment, so output quality may vary
Some submodules may have different licenses (check the LICENSE file)
Consider the ethical implications of AI-generated content in your use case

Q: What GPU do I need to run TRELLIS.2?

A: Minimum requirements:

GPU: NVIDIA GPU with 24GB VRAM (e.g., RTX 3090, A5000, A100)
Resolution: 512³ requires 16GB, 1024³ requires 24GB, 1536³ requires 40GB+
Tested on: NVIDIA A100 and H100 GPUs
Not supported: AMD GPUs, Apple Silicon (MPS), CPU-only inference

For optimal performance, use NVIDIA H100 or A100 (80GB) GPUs.

Q: How do I improve generation quality?

A: Follow these best practices:

Input Image Quality:
- Use high-resolution images (1024×1024 recommended)
- Ensure good lighting and clear object visibility
- Remove complex backgrounds if possible
- Avoid heavily compressed or noisy images
Generation Settings:
- Use higher resolution (1024³ or 1536³) for detailed assets
- Try different random seeds (generate 3-5 variants)
- Experiment with different input angles if available
Post-Processing:
- Use mesh repair tools for geometric artifacts
- Apply texture enhancement in 3D software
- Optimize topology for your specific use case

Q: Can TRELLIS.2 generate 3D assets from text prompts?

A: TRELLIS.2-4B is specifically designed for image-to-3D generation. For text-to-3D, you have two options:

Two-stage approach (Recommended):
- Use a text-to-image model (DALL-E, Midjourney, Stable Diffusion)
- Feed generated image to TRELLIS.2-4B
- This typically produces better results
Use TRELLIS text models:
- TRELLIS-text-base (342M)
- TRELLIS-text-large (1.1B)
- TRELLIS-text-xlarge (2.0B)
- Note: These are from TRELLIS v1, not TRELLIS.2

Q: How long does generation take?

A: Generation time depends on resolution and hardware:

On NVIDIA H100:

512³: ~3 seconds (2s shape + 1s material)
1024³: ~17 seconds (10s shape + 7s material)
1536³: ~60 seconds (35s shape + 25s material)

On NVIDIA A100 (40GB):

512³: ~5 seconds
1024³: ~30 seconds
1536³: ~120 seconds

Older GPUs (RTX 3090, A6000) will be proportionally slower.

Q: What output formats are supported?

A: TRELLIS.2 supports multiple industry-standard formats:

GLB/GLTF: Optimized for web, game engines (Unity, Unreal), and AR/VR
PLY: Point cloud format, useful for Gaussian splatting
OBJ: Traditional mesh format for 3D modeling software
Mesh with PBR: Full material properties (Base Color, Metallic, Roughness, Alpha)

All formats include full PBR material information where applicable.

Q: Can I train TRELLIS.2 on my own dataset?

A: Yes, the complete training code is provided. You can:

Fine-tune the pre-trained model on your custom dataset
Train from scratch if you have sufficient data (100K+ assets recommended)
Modify architecture for research purposes

Requirements:

Convert your 3D assets to O-Voxel format using provided tools
Minimum 8× NVIDIA A100 GPUs for fine-tuning
32× A100 GPUs for full training of 4B model
Training time: 1-4 weeks depending on model size

Q: Does TRELLIS.2 work on Windows?

A: TRELLIS.2 is primarily developed and tested on Linux (Ubuntu 20.04+). Windows support is:

Not officially supported by the development team
Possible with community workarounds (see GitHub issues)
Recommended approach: Use WSL2 (Windows Subsystem for Linux) with GPU passthrough

For production use, Linux is strongly recommended.

Q: How does TRELLIS.2 handle transparent or translucent objects?

A: TRELLIS.2 has native support for transparency through the Alpha channel in its PBR material system:

Opacity/Alpha attribute is part of the O-Voxel representation
Supports both binary transparency (glass) and gradient translucency (smoke, water)
Exports correctly to GLB format with alpha channel preserved
Compatible with standard rendering engines that support PBR

This is a significant advantage over methods that only support opaque surfaces.

Q: What is the TRELLIS-500K dataset?

A: TRELLIS-500K is the training dataset for TRELLIS.2, containing:

500,000 curated 3D assets from multiple sources
Filtered based on aesthetic scores and quality metrics
Includes diverse categories: objects, furniture, toys, architectural elements
Publicly available for research purposes
Comes with data preparation toolkits for processing custom assets

Sources: Objaverse(XL), ABO, 3D-FUTURE, HSSD, Toys4k

Conclusion and Next Steps

Summary

TRELLIS.2-4B represents a significant breakthrough in 3D generative AI, offering:

✅ Unmatched Versatility: Handles arbitrary topologies including open surfaces, non-manifold geometry, and internal structures
✅ Exceptional Efficiency: 3-60 second generation time with compact 9.6K token representation
✅ Production-Ready Quality: Full PBR materials with photorealistic rendering capabilities
✅ Open Research: MIT License with complete training code and 500K dataset
✅ Minimalist Pipeline: Instant, optimization-free mesh conversion

Getting Started Checklist

Verify hardware requirements (24GB+ NVIDIA GPU)
Install CUDA Toolkit 12.4+
Clone repository and install dependencies
Download or prepare test images
Run basic image-to-3D generation example
Experiment with different resolutions and settings
Export to GLB format for use in your pipeline

Recommended Next Steps

For Researchers:

Explore the technical paper: arXiv:2512.14692
Download TRELLIS-500K dataset for analysis
Experiment with architecture modifications
Benchmark against your own methods

For Developers:

Integrate TRELLIS.2 into your 3D content pipeline
Build applications using the API
Optimize for your specific hardware configuration
Contribute to the open-source project

For Artists and Designers:

Test with various input images to understand capabilities
Develop workflows combining text-to-image and TRELLIS.2
Experiment with post-processing in 3D software
Share results and feedback with the community

Resources and Links

Resource	Link
Official Repository	github.com/microsoft/TRELLIS.2
Research Paper	arxiv.org/abs/2512.14692
Project Page	microsoft.github.io/TRELLIS.2
Model Hub	huggingface.co/microsoft/TRELLIS.2-4B
Dataset	TRELLIS-500K Documentation
Original TRELLIS	github.com/microsoft/TRELLIS

Community and Support

GitHub Issues: Report bugs and request features
Discussions: Share results and ask questions
Research Collaboration: Contact the authors for academic partnerships
Commercial Inquiries: Review MIT License terms and conditions

Final Thoughts

TRELLIS.2-4B pushes the boundaries of what's possible in 3D generative AI, combining cutting-edge research with practical usability. Whether you're building the next generation of 3D content tools, conducting academic research, or creating immersive experiences, TRELLIS.2 provides a powerful foundation for innovation in 3D generation.

The open-source nature of the project, combined with comprehensive documentation and pre-trained models, makes it accessible to a wide range of users—from researchers exploring new architectures to developers building production applications.

Start generating high-quality 3D assets today with TRELLIS.2-4B!

Last Updated: December 2025
Model Version: TRELLIS.2-4B
License: MIT License

TRELLIS.2-4B Complete Guide

TRELLIS.2-4B: The Complete Guide to Microsoft's Revolutionary 3D Generation Model (2025)

🎯 Core Highlights (TL;DR)

Table of Contents

What is TRELLIS.2-4B?

Core Capabilities

Research Background

Evolution from TRELLIS to TRELLIS.2

TRELLIS (First Generation)

TRELLIS.2 Breakthrough

Technical Innovations

1. O-Voxel: Omni-Voxel Representation

Geometry Component (f_shape)

Appearance Component (f_mat)

2. SC-VAE: Sparse Compression VAE

3. Flow-Matching Transformer Architecture

4. Instant Bidirectional Conversion

Key Features and Capabilities

High Quality and Resolution

Arbitrary Topology Handling

Rich Texture Modeling with PBR

Shape-Conditioned Texture Generation

Performance Benchmarks

Generation Speed (NVIDIA H100)

Latent Space Efficiency

Hardware Requirements

Installation and Setup

Prerequisites

Step-by-Step Installation

1. Clone the Repository

2. Create Conda Environment

3. Install Dependencies

4. Environment Configuration

5. Download Pre-trained Models

Troubleshooting Installation

How to Use TRELLIS.2-4B

Basic Image-to-3D Generation

Advanced Usage: Multi-Resolution Generation

Shape-Conditioned Texture Generation

Batch Processing

Output Formats

Comparison with Other 3D Generation Models

TRELLIS.2 vs. Original TRELLIS

TRELLIS.2 vs. Other State-of-the-Art Models

When to Use TRELLIS.2

Training Your Own Model

Training Dataset: TRELLIS-500K

Training Pipeline Overview

Training Configuration

Multi-Node Distributed Training

Fine-tuning on Custom Data

Training Hardware Requirements

Limitations and Considerations

Known Limitations

1. Geometric Artifacts

2. Base Model Without Alignment

3. Input Image Quality Dependency

4. Memory Requirements

Responsible AI Considerations

Performance Considerations

Frequently Asked Questions

Q: What is the difference between TRELLIS and TRELLIS.2?

Q: Can I use TRELLIS.2 for commercial projects?

Q: What GPU do I need to run TRELLIS.2?

Q: How do I improve generation quality?

Q: Can TRELLIS.2 generate 3D assets from text prompts?

Q: How long does generation take?

Q: What output formats are supported?

Q: Can I train TRELLIS.2 on my own dataset?

Q: Does TRELLIS.2 work on Windows?

Q: How does TRELLIS.2 handle transparent or translucent objects?

Q: What is the TRELLIS-500K dataset?

Conclusion and Next Steps

Summary

Getting Started Checklist

Recommended Next Steps

Resources and Links

Community and Support

Final Thoughts