Sora Watermark Remover - Allows you to remove the watermark from Sora videos.Try Now

CurateClick

TRELLIS.2-4B: The Complete Guide to Microsoft's Revolutionary 3D Generation Model (2025)

🎯 Core Highlights (TL;DR)

  • TRELLIS.2-4B is Microsoft's state-of-the-art 4 billion parameter model for high-fidelity image-to-3D generation
  • Introduces O-Voxel (Omni-Voxel), a breakthrough "field-free" representation handling arbitrary topologies including open surfaces and non-manifold geometry
  • Achieves ultra-fast generation: 3 seconds for 512Β³ resolution, 17 seconds for 1024Β³ on NVIDIA H100
  • Supports full PBR materials (Base Color, Metallic, Roughness, Opacity) for photorealistic rendering
  • Compresses 1024Β³ assets into only ~9.6K latent tokens with negligible quality loss
  • Open-source under MIT License with complete training code and 500K dataset

Table of Contents

  1. What is TRELLIS.2-4B?
  2. Evolution from TRELLIS to TRELLIS.2
  3. Technical Innovations
  4. Key Features and Capabilities
  5. Performance Benchmarks
  6. Installation and Setup
  7. How to Use TRELLIS.2-4B
  8. Comparison with Other 3D Generation Models
  9. Training Your Own Model
  10. Limitations and Considerations
  11. Frequently Asked Questions
  12. Conclusion and Next Steps

What is TRELLIS.2-4B?

TRELLIS.2-4B is Microsoft Research's latest breakthrough in 3D generative AI, representing a significant leap forward in image-to-3D conversion technology. As a 4 billion parameter model, it transforms single 2D images into fully textured, high-resolution 3D assets with unprecedented quality and speed.

Core Capabilities

  • Input: Single RGB image
  • Output: Fully textured 3D mesh with PBR materials
  • Resolution: Supports 512Β³ to 1536Β³ voxel grid resolution
  • Speed: 3-60 seconds depending on resolution (NVIDIA H100)
  • License: MIT License (open-source)

πŸ’‘ Key Innovation
Unlike traditional methods that rely on implicit fields (SDF, NeRF) or iso-surface representations (Flexicubes), TRELLIS.2 uses a novel "field-free" approach that natively handles complex geometries without lossy conversions.

Research Background

Developed by a collaborative team from Tsinghua University and Microsoft Research, TRELLIS.2 builds upon the original TRELLIS model (CVPR'25 Spotlight) with fundamental architectural improvements. The research paper is available at arXiv:2512.14692.


Evolution from TRELLIS to TRELLIS.2

TRELLIS (First Generation)

The original TRELLIS introduced the concept of Structured LATent (SLAT) representation, enabling:

  • Multiple output formats (Radiance Fields, 3D Gaussians, Meshes)
  • Models up to 2B parameters
  • Training on 500K diverse 3D objects
ModelParametersKey Feature
TRELLIS-image-large1.2BImage-to-3D generation
TRELLIS-text-base342MText-to-3D (base)
TRELLIS-text-large1.1BText-to-3D (large)
TRELLIS-text-xlarge2.0BText-to-3D (extra-large)

TRELLIS.2 Breakthrough

TRELLIS.2 represents a paradigm shift with:

βœ… Native topology handling - No conversion artifacts
βœ… Compact latent space - 16Γ— spatial compression
βœ… Instant processing - Rendering-free, optimization-free
βœ… Full PBR support - Including transparency/translucency
βœ… Higher resolution - Up to 1536Β³ voxel grids


Technical Innovations

1. O-Voxel: Omni-Voxel Representation

O-Voxel is the cornerstone innovation of TRELLIS.2, representing a "field-free" sparse voxel structure that simultaneously encodes geometry and appearance.

Geometry Component (f_shape)

  • Flexible Dual Grids: Handles arbitrary topologies
  • Sharp Edge Preservation: Maintains geometric details
  • Topology Freedom: Supports open surfaces, non-manifold geometry, internal structures

Appearance Component (f_mat)

  • Base Color: RGB texture information
  • Metallic: Material reflectivity
  • Roughness: Surface smoothness
  • Alpha: Transparency/translucency support

⚠️ Technical Advantage
Traditional iso-surface methods (SDF, Flexicubes) struggle with:

  • Open surfaces (e.g., cloth, hair)
  • Non-manifold geometry (e.g., intersecting surfaces)
  • Internal structures (e.g., hollow objects)

O-Voxel handles all these cases natively without conversion artifacts.

2. SC-VAE: Sparse Compression VAE

The Sparse Compression 3D VAE employs a Sparse Residual Autoencoding scheme to achieve unprecedented compression ratios.

ResolutionLatent TokensCompression Ratio
512Β³~2.4K64Γ— spatial
1024Β³~9.6K16Γ— spatial
1536Β³~21.6K7Γ— spatial

Key Features:

  • Negligible perceptual degradation
  • Efficient large-scale generative modeling
  • Direct voxel compression without intermediate representations

3. Flow-Matching Transformer Architecture

TRELLIS.2-4B utilizes vanilla DiT (Diffusion Transformer) architecture with:

  • 4 billion parameters
  • Flow-matching training objective
  • Efficient attention mechanisms for sparse data
  • Multi-resolution training strategy

4. Instant Bidirectional Conversion

One of TRELLIS.2's most practical innovations is the ability to convert between meshes and O-Voxels instantly:

DirectionTime (Single CPU)Time (CUDA)
Mesh β†’ O-Voxel< 10 seconds< 100ms
O-Voxel β†’ Mesh< 10 seconds< 100ms

This enables:

  • Rendering-free processing: No need for multi-view rendering
  • Optimization-free workflow: Direct conversion without iterative refinement
  • Minimalist pipeline: Simplified data preparation and post-processing

Key Features and Capabilities

High Quality and Resolution

TRELLIS.2-4B generates assets with exceptional fidelity across multiple resolutions:

πŸ“Š Generation Quality Metrics

Resolution: 512Β³
- Generation Time: 3 seconds (2s shape + 1s material)
- Detail Level: High
- Use Case: Rapid prototyping, real-time applications

Resolution: 1024Β³
- Generation Time: 17 seconds (10s shape + 7s material)
- Detail Level: Very High
- Use Case: Production assets, game development

Resolution: 1536Β³
- Generation Time: 60 seconds (35s shape + 25s material)
- Detail Level: Ultra High
- Use Case: Film production, high-end visualization

Arbitrary Topology Handling

Unlike traditional methods constrained by iso-surface representations, TRELLIS.2 robustly handles:

βœ” Open Surfaces

  • Cloth, curtains, flags
  • Hair and fur
  • Thin structures

βœ” Non-manifold Geometry

  • Intersecting surfaces
  • Self-intersections
  • Complex architectural elements

βœ” Internal Structures

  • Hollow objects
  • Multi-layer constructions
  • Enclosed cavities

Rich Texture Modeling with PBR

Full Physically Based Rendering (PBR) support enables photorealistic relighting:

Material PropertyDescriptionUse Case
Base ColorRGB albedo textureSurface appearance
MetallicMetal vs. dielectricMaterial type classification
RoughnessSurface smoothnessSpecular reflection control
Opacity (Alpha)Transparency levelGlass, water, translucent materials

βœ… Best Practice
The PBR material system is compatible with standard game engines (Unity, Unreal Engine) and 3D software (Blender, Maya), enabling seamless integration into production pipelines.

Shape-Conditioned Texture Generation

TRELLIS.2 supports two generation modes:

  1. Image-to-3D: Generate complete 3D asset from single image
  2. Texture Generation: Generate textures for existing 3D meshes with reference image

This flexibility allows:

  • Re-texturing existing assets
  • Style transfer to 3D models
  • Texture variation generation

Performance Benchmarks

Generation Speed (NVIDIA H100)

ResolutionShape GenerationMaterial GenerationTotal Time
512Β³2 seconds1 second3 seconds
1024Β³10 seconds7 seconds17 seconds
1536Β³35 seconds25 seconds60 seconds

Latent Space Efficiency

TRELLIS.2 achieves state-of-the-art compression while maintaining quality:

Reconstruction Accuracy vs. Latent Compactness

TRELLIS.2: β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ (Highest)
Method A:  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘
Method B:  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘
Method C:  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘

Compactness (tokens per 1024Β³):
TRELLIS.2: ~9.6K
Method A:  ~25K
Method B:  ~40K
Method C:  ~60K

Hardware Requirements

ComponentMinimumRecommended
GPU Memory24GB48GB+
GPU ModelNVIDIA A100NVIDIA H100
System RAM32GB64GB+
CUDA Version12.4+12.4+
OSLinuxLinux (Ubuntu 20.04+)

Installation and Setup

Prerequisites

Before installing TRELLIS.2, ensure your system meets these requirements:

  • Operating System: Linux (tested on Ubuntu 20.04+)
  • GPU: NVIDIA GPU with 24GB+ VRAM
  • CUDA Toolkit: Version 12.4 or higher
  • Python: Version 3.8 or higher
  • Conda: For dependency management

⚠️ Windows Users
While primarily tested on Linux, Windows setup is possible but not officially supported. Refer to community discussions for Windows-specific configurations.

Step-by-Step Installation

1. Clone the Repository

git clone --recurse-submodules https://github.com/microsoft/TRELLIS.2.git cd TRELLIS.2

2. Create Conda Environment

# Create new environment conda create -n trellis2 python=3.10 conda activate trellis2

3. Install Dependencies

The installation script provides modular dependency installation:

# Install all dependencies for inference . ./setup.sh --new-env --basic --xformers --flash-attn \ --diffoctreerast --spconv --mipgaussian --kaolin --nvdiffrast

Installation Flags Explained:

FlagPurpose
--new-envCreate new conda environment named 'trellis2'
--basicInstall core dependencies
--xformersMemory-efficient attention (for GPUs without flash-attn)
--flash-attnFast attention implementation (recommended)
--diffoctreerastDifferentiable octree rasterizer
--spconvSparse convolution operations
--mipgaussianMip-splatting for Gaussian rendering
--kaolinNVIDIA's 3D deep learning library
--nvdiffrastDifferentiable rasterizer

4. Environment Configuration

Set environment variables for optimal performance:

export OPENCV_IO_ENABLE_OPENEXR=1 export PYTORCH_CUDA_ALLOC_CONF="expandable_segments:True" # For GPUs without flash-attn support (e.g., V100) # export ATTN_BACKEND=xformers # SPCONV algorithm selection export SPCONV_ALGO=native # Use 'auto' for benchmarking (slower first run)

5. Download Pre-trained Models

Models are automatically downloaded from Hugging Face on first use, or download manually:

# Models will be cached in ~/.cache/huggingface/ # No manual download required for basic usage

Troubleshooting Installation

Issue: CUDA version mismatch

# Check CUDA version nvcc --version # Set correct CUDA path export PATH=/usr/local/cuda-12.4/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda-12.4/lib64:$LD_LIBRARY_PATH

Issue: Out of memory during compilation

# Limit parallel compilation jobs export MAX_JOBS=4

How to Use TRELLIS.2-4B

Basic Image-to-3D Generation

Here's a minimal example to generate a 3D asset from an image:

import os os.environ['OPENCV_IO_ENABLE_OPENEXR'] = '1' os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True" import cv2 import imageio from PIL import Image import torch from trellis2.pipelines import Trellis2ImageTo3DPipeline from trellis2.utils import render_utils from trellis2.renderers import EnvMap import o_voxel # 1. Setup Environment Map for PBR rendering envmap = EnvMap(torch.tensor( cv2.cvtColor(cv2.imread('assets/hdri/forest.exr', cv2.IMREAD_UNCHANGED), cv2.COLOR_BGR2RGB), dtype=torch.float32, device='cuda' )) # 2. Load Pipeline pipeline = Trellis2ImageTo3DPipeline.from_pretrained("microsoft/TRELLIS.2-4B") pipeline.cuda() # 3. Load Image & Run Generation image = Image.open("assets/example_image/T.png") mesh = pipeline.run(image)[0] # 4. Simplify mesh (nvdiffrast has 16M triangle limit) mesh.simplify(16777216) # 5. Render Video Preview video = render_utils.make_pbr_vis_frames( render_utils.render_video(mesh, envmap=envmap) ) imageio.mimsave("output.mp4", video, fps=15) # 6. Export to GLB format glb = o_voxel.postprocess.to_glb( vertices = mesh.vertices, faces = mesh.faces, attr_volume = mesh.attrs, coords = mesh.coords, attr_layout = mesh.layout, voxel_size = mesh.voxel_size, aabb = [[-0.5, -0.5, -0.5], [0.5, 0.5, 0.5]], decimation_target = 1000000, texture_size = 4096, remesh = True, remesh_band = 1, remesh_project = 0, verbose = True ) glb.export("output.glb", extension_webp=True)

Advanced Usage: Multi-Resolution Generation

Generate assets at different resolutions based on your needs:

# High-speed generation (512Β³) mesh_fast = pipeline.run( image, resolution=512, seed=42 )[0] # Balanced quality (1024Β³) - Default mesh_balanced = pipeline.run( image, resolution=1024, seed=42 )[0] # Maximum quality (1536Β³) mesh_ultra = pipeline.run( image, resolution=1536, seed=42 )[0]

Shape-Conditioned Texture Generation

Generate textures for existing 3D meshes:

from trellis2.pipelines import Trellis2TextureGenerationPipeline # Load texture generation pipeline texture_pipeline = Trellis2TextureGenerationPipeline.from_pretrained( "microsoft/TRELLIS.2-4B" ) texture_pipeline.cuda() # Load existing mesh and reference image input_mesh = o_voxel.io.load_mesh("input_model.obj") reference_image = Image.open("texture_reference.png") # Generate texture textured_mesh = texture_pipeline.run( mesh=input_mesh, image=reference_image, seed=42 )[0]

Batch Processing

Process multiple images efficiently:

import glob from pathlib import Path # Load pipeline once pipeline = Trellis2ImageTo3DPipeline.from_pretrained("microsoft/TRELLIS.2-4B") pipeline.cuda() # Process all images in directory image_paths = glob.glob("input_images/*.png") for img_path in image_paths: image = Image.open(img_path) mesh = pipeline.run(image)[0] # Save with same filename output_name = Path(img_path).stem glb = o_voxel.postprocess.to_glb( vertices=mesh.vertices, faces=mesh.faces, attr_volume=mesh.attrs, coords=mesh.coords, attr_layout=mesh.layout, voxel_size=mesh.voxel_size, aabb=[[-0.5, -0.5, -0.5], [0.5, 0.5, 0.5]], decimation_target=1000000, texture_size=4096 ) glb.export(f"output/{output_name}.glb", extension_webp=True)

Output Formats

TRELLIS.2 supports multiple output formats:

FormatExtensionUse Case
GLB.glbWeb, game engines, general 3D software
PLY.plyPoint cloud, Gaussian splatting
OBJ.objTraditional 3D modeling software
GLTF.gltfWeb applications, AR/VR
# Export to different formats mesh.save_ply("output.ply") # Gaussian representation mesh.save_obj("output.obj") # Traditional mesh glb.export("output.glb") # Optimized for web/games

Comparison with Other 3D Generation Models

TRELLIS.2 vs. Original TRELLIS

FeatureTRELLIS (v1)TRELLIS.2
RepresentationSLAT (Structured Latent)O-Voxel (Omni-Voxel)
Topology SupportLimited (iso-surface based)Arbitrary (field-free)
Max Resolution1024Β³1536Β³
Latent Tokens (1024Β³)~25K~9.6K
PBR MaterialsPartialFull (including alpha)
Processing PipelineMulti-stage renderingInstant conversion
Open SurfacesβŒβœ…
Non-manifold GeometryβŒβœ…
Internal StructuresβŒβœ…

TRELLIS.2 vs. Other State-of-the-Art Models

ModelParametersSpeed (1024Β³)TopologyPBR Support
TRELLIS.2-4B4B17sArbitraryFull
Shap-E300M~30sLimitedPartial
Point-E1B~45sLimitedNo
DreamFusion-~2 hoursLimitedPartial
Magic3D-~40 minLimitedPartial
Instant3D2B~25sLimitedPartial

βœ… Competitive Advantage
TRELLIS.2's combination of speed, quality, and topology flexibility makes it the most versatile solution for production-ready 3D asset generation.

When to Use TRELLIS.2

Best Use Cases:

  • Production asset creation for games and films
  • Rapid prototyping and concept visualization
  • E-commerce 3D product visualization
  • AR/VR content creation
  • Architectural visualization
  • Digital twin creation

Consider Alternatives When:

  • You need text-only input (use TRELLIS-text models)
  • You require real-time generation on mobile devices
  • You need extremely high polygon counts (>10M triangles)
  • You're working with specific artistic styles (may need fine-tuning)

Training Your Own Model

TRELLIS.2 provides complete training code for researchers and developers who want to:

  • Fine-tune on custom datasets
  • Experiment with architecture modifications
  • Train domain-specific models

Training Dataset: TRELLIS-500K

Microsoft provides TRELLIS-500K, a curated dataset containing 500,000 high-quality 3D assets from:

SourceAssetsDescription
Objaverse(XL)~350KDiverse everyday objects
ABO~50KAmazon product catalog
3D-FUTURE~40KFurniture and interior design
HSSD~40KHabitat synthetic scenes
Toys4k~20KToy objects

All assets are filtered based on aesthetic scores and quality metrics.

Training Pipeline Overview

The training process follows a multi-stage approach:

Stage 1: VAE Training
β”œβ”€β”€ Sparse Structure VAE (ss_vae)
└── SLat VAE with Decoders (slat_vae)
    β”œβ”€β”€ Gaussian Decoder
    β”œβ”€β”€ Radiance Field Decoder
    └── Mesh Decoder

Stage 2: Flow Model Training
β”œβ”€β”€ Sparse Structure Flow (ss_flow)
└── SLat Flow (slat_flow)

Training Configuration

Example configurations are provided in the configs/ directory:

VAE Training:

# Train Sparse Structure VAE python train.py \ --config configs/vae/ss_vae_conv3d_16l8_fp16.json \ --output_dir outputs/ss_vae \ --data_dir /path/to/TRELLIS-500K \ --num_gpus 8 # Train SLat VAE with Gaussian Decoder python train.py \ --config configs/vae/slat_vae_enc_dec_gs_swin8_B_64l8_fp16.json \ --output_dir outputs/slat_vae_gs \ --data_dir /path/to/TRELLIS-500K \ --num_gpus 8

Flow Model Training:

# Train Image-conditioned Flow Model python train.py \ --config configs/generation/slat_flow_img_dit_L_64l8p2_fp16.json \ --output_dir outputs/slat_flow_img \ --data_dir /path/to/TRELLIS-500K \ --num_nodes 4 \ --num_gpus 8 \ --master_addr $MASTER_ADDR \ --master_port $MASTER_PORT

Multi-Node Distributed Training

For large-scale training across multiple machines:

# Node 0 (Master) python train.py \ --config configs/generation/slat_flow_img_dit_L_64l8p2_fp16.json \ --output_dir outputs/slat_flow_img_distributed \ --data_dir /path/to/TRELLIS-500K \ --num_nodes 4 \ --node_rank 0 \ --num_gpus 8 \ --master_addr 192.168.1.100 \ --master_port 29500 # Node 1 python train.py \ --config configs/generation/slat_flow_img_dit_L_64l8p2_fp16.json \ --output_dir outputs/slat_flow_img_distributed \ --data_dir /path/to/TRELLIS-500K \ --num_nodes 4 \ --node_rank 1 \ --num_gpus 8 \ --master_addr 192.168.1.100 \ --master_port 29500 # Repeat for nodes 2 and 3...

Fine-tuning on Custom Data

To fine-tune TRELLIS.2 on your own dataset:

  1. Prepare Data: Convert your 3D assets to O-Voxel format
  2. Configure Training: Modify config files for your dataset
  3. Resume from Checkpoint: Load pre-trained weights
python train.py \ --config configs/generation/slat_flow_img_dit_L_64l8p2_fp16.json \ --output_dir outputs/custom_finetuned \ --data_dir /path/to/custom/dataset \ --load_dir microsoft/TRELLIS.2-4B \ --num_gpus 8

Training Hardware Requirements

Model SizeRecommended GPUsTraining Time (500K dataset)
Base (342M)8Γ— A100 (40GB)~1 week
Large (1.1B)16Γ— A100 (40GB)~2 weeks
XLarge (4B)32Γ— A100 (80GB)~4 weeks

Limitations and Considerations

Known Limitations

1. Geometric Artifacts

Issue: Generated meshes may occasionally contain small holes or minor topological discontinuities.

Impact:

  • Affects applications requiring watertight geometry (3D printing, simulation)
  • More common in high-complexity models with intricate details

Mitigation:

# Use provided post-processing scripts from trellis2.utils import mesh_repair cleaned_mesh = mesh_repair.fill_holes(mesh, max_hole_size=100) cleaned_mesh = mesh_repair.remove_degenerate_faces(cleaned_mesh)

2. Base Model Without Alignment

Issue: TRELLIS.2-4B is a pre-trained foundation model without human preference alignment (RLHF).

Impact:

  • Output style reflects training data distribution
  • May require multiple generations to achieve desired aesthetic
  • Not optimized for specific artistic styles

Recommendations:

  • Generate multiple variants with different seeds
  • Use post-processing for style refinement
  • Consider fine-tuning for specific aesthetic requirements

3. Input Image Quality Dependency

Issue: Output quality heavily depends on input image characteristics.

Best Practices:

  • Use high-resolution images (512Γ—512 minimum, 1024Γ—1024 recommended)
  • Ensure clear object visibility with minimal occlusion
  • Prefer images with good lighting and contrast
  • Avoid heavily compressed or noisy images

4. Memory Requirements

Issue: High-resolution generation requires substantial GPU memory.

ResolutionMinimum VRAMRecommended VRAM
512Β³16GB24GB
1024Β³24GB40GB
1536Β³40GB80GB

Memory Optimization:

# Enable memory-efficient settings os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True" # Use gradient checkpointing during inference pipeline.enable_memory_efficient_attention()

Responsible AI Considerations

⚠️ Important Notice
TRELLIS.2 is a research project. Responsible AI considerations were factored into all stages:

  • Dataset Curation: Public datasets reviewed for harmful content and PII
  • Potential Bias: Internet-sourced data may contain inherent biases
  • Intended Use: Academic and research purposes only
  • Commercial Use: Requires careful evaluation of generated content

Ethical Guidelines:

  • Do not generate content that infringes intellectual property rights
  • Avoid creating misleading or deceptive 3D representations
  • Respect privacy and consent when generating assets based on real objects
  • Consider cultural sensitivity in generated content

Performance Considerations

Factors Affecting Generation Quality:

  1. Input Image Characteristics

    • Resolution and clarity
    • Lighting conditions
    • Object visibility and occlusion
    • Background complexity
  2. Generation Parameters

    • Resolution setting (512Β³ vs 1024Β³ vs 1536Β³)
    • Random seed selection
    • Sampling steps (if configurable)
  3. Hardware Configuration

    • GPU model and memory
    • CUDA version compatibility
    • Driver version

Frequently Asked Questions

Q: What is the difference between TRELLIS and TRELLIS.2?

A: TRELLIS.2 represents a fundamental architectural upgrade from the original TRELLIS. The key differences are:

  • Representation: TRELLIS uses SLAT (Structured Latent), while TRELLIS.2 uses O-Voxel (Omni-Voxel), a "field-free" approach
  • Topology: TRELLIS.2 natively handles arbitrary topologies including open surfaces and non-manifold geometry, which TRELLIS cannot
  • Efficiency: TRELLIS.2 compresses 1024Β³ assets into ~9.6K tokens vs. ~25K in TRELLIS
  • Materials: TRELLIS.2 supports full PBR including transparency, while TRELLIS has partial support
  • Processing: TRELLIS.2 offers instant mesh conversion (<100ms with CUDA) vs. multi-stage rendering in TRELLIS

Q: Can I use TRELLIS.2 for commercial projects?

A: Yes, TRELLIS.2 is released under the MIT License, which permits commercial use. However:

  • Verify that generated assets don't infringe on existing intellectual property
  • The model is a base model without alignment, so output quality may vary
  • Some submodules may have different licenses (check the LICENSE file)
  • Consider the ethical implications of AI-generated content in your use case

Q: What GPU do I need to run TRELLIS.2?

A: Minimum requirements:

  • GPU: NVIDIA GPU with 24GB VRAM (e.g., RTX 3090, A5000, A100)
  • Resolution: 512Β³ requires 16GB, 1024Β³ requires 24GB, 1536Β³ requires 40GB+
  • Tested on: NVIDIA A100 and H100 GPUs
  • Not supported: AMD GPUs, Apple Silicon (MPS), CPU-only inference

For optimal performance, use NVIDIA H100 or A100 (80GB) GPUs.

Q: How do I improve generation quality?

A: Follow these best practices:

  1. Input Image Quality:

    • Use high-resolution images (1024Γ—1024 recommended)
    • Ensure good lighting and clear object visibility
    • Remove complex backgrounds if possible
    • Avoid heavily compressed or noisy images
  2. Generation Settings:

    • Use higher resolution (1024Β³ or 1536Β³) for detailed assets
    • Try different random seeds (generate 3-5 variants)
    • Experiment with different input angles if available
  3. Post-Processing:

    • Use mesh repair tools for geometric artifacts
    • Apply texture enhancement in 3D software
    • Optimize topology for your specific use case

Q: Can TRELLIS.2 generate 3D assets from text prompts?

A: TRELLIS.2-4B is specifically designed for image-to-3D generation. For text-to-3D, you have two options:

  1. Two-stage approach (Recommended):

    • Use a text-to-image model (DALL-E, Midjourney, Stable Diffusion)
    • Feed generated image to TRELLIS.2-4B
    • This typically produces better results
  2. Use TRELLIS text models:

    • TRELLIS-text-base (342M)
    • TRELLIS-text-large (1.1B)
    • TRELLIS-text-xlarge (2.0B)
    • Note: These are from TRELLIS v1, not TRELLIS.2

Q: How long does generation take?

A: Generation time depends on resolution and hardware:

On NVIDIA H100:

  • 512Β³: ~3 seconds (2s shape + 1s material)
  • 1024Β³: ~17 seconds (10s shape + 7s material)
  • 1536Β³: ~60 seconds (35s shape + 25s material)

On NVIDIA A100 (40GB):

  • 512Β³: ~5 seconds
  • 1024Β³: ~30 seconds
  • 1536Β³: ~120 seconds

Older GPUs (RTX 3090, A6000) will be proportionally slower.

Q: What output formats are supported?

A: TRELLIS.2 supports multiple industry-standard formats:

  • GLB/GLTF: Optimized for web, game engines (Unity, Unreal), and AR/VR
  • PLY: Point cloud format, useful for Gaussian splatting
  • OBJ: Traditional mesh format for 3D modeling software
  • Mesh with PBR: Full material properties (Base Color, Metallic, Roughness, Alpha)

All formats include full PBR material information where applicable.

Q: Can I train TRELLIS.2 on my own dataset?

A: Yes, the complete training code is provided. You can:

  1. Fine-tune the pre-trained model on your custom dataset
  2. Train from scratch if you have sufficient data (100K+ assets recommended)
  3. Modify architecture for research purposes

Requirements:

  • Convert your 3D assets to O-Voxel format using provided tools
  • Minimum 8Γ— NVIDIA A100 GPUs for fine-tuning
  • 32Γ— A100 GPUs for full training of 4B model
  • Training time: 1-4 weeks depending on model size

Q: Does TRELLIS.2 work on Windows?

A: TRELLIS.2 is primarily developed and tested on Linux (Ubuntu 20.04+). Windows support is:

  • Not officially supported by the development team
  • Possible with community workarounds (see GitHub issues)
  • Recommended approach: Use WSL2 (Windows Subsystem for Linux) with GPU passthrough

For production use, Linux is strongly recommended.

Q: How does TRELLIS.2 handle transparent or translucent objects?

A: TRELLIS.2 has native support for transparency through the Alpha channel in its PBR material system:

  • Opacity/Alpha attribute is part of the O-Voxel representation
  • Supports both binary transparency (glass) and gradient translucency (smoke, water)
  • Exports correctly to GLB format with alpha channel preserved
  • Compatible with standard rendering engines that support PBR

This is a significant advantage over methods that only support opaque surfaces.

Q: What is the TRELLIS-500K dataset?

A: TRELLIS-500K is the training dataset for TRELLIS.2, containing:

  • 500,000 curated 3D assets from multiple sources
  • Filtered based on aesthetic scores and quality metrics
  • Includes diverse categories: objects, furniture, toys, architectural elements
  • Publicly available for research purposes
  • Comes with data preparation toolkits for processing custom assets

Sources: Objaverse(XL), ABO, 3D-FUTURE, HSSD, Toys4k


Conclusion and Next Steps

Summary

TRELLIS.2-4B represents a significant breakthrough in 3D generative AI, offering:

βœ… Unmatched Versatility: Handles arbitrary topologies including open surfaces, non-manifold geometry, and internal structures
βœ… Exceptional Efficiency: 3-60 second generation time with compact 9.6K token representation
βœ… Production-Ready Quality: Full PBR materials with photorealistic rendering capabilities
βœ… Open Research: MIT License with complete training code and 500K dataset
βœ… Minimalist Pipeline: Instant, optimization-free mesh conversion

Getting Started Checklist

  • Verify hardware requirements (24GB+ NVIDIA GPU)
  • Install CUDA Toolkit 12.4+
  • Clone repository and install dependencies
  • Download or prepare test images
  • Run basic image-to-3D generation example
  • Experiment with different resolutions and settings
  • Export to GLB format for use in your pipeline

Recommended Next Steps

For Researchers:

  1. Explore the technical paper: arXiv:2512.14692
  2. Download TRELLIS-500K dataset for analysis
  3. Experiment with architecture modifications
  4. Benchmark against your own methods

For Developers:

  1. Integrate TRELLIS.2 into your 3D content pipeline
  2. Build applications using the API
  3. Optimize for your specific hardware configuration
  4. Contribute to the open-source project

For Artists and Designers:

  1. Test with various input images to understand capabilities
  2. Develop workflows combining text-to-image and TRELLIS.2
  3. Experiment with post-processing in 3D software
  4. Share results and feedback with the community

Resources and Links

Community and Support

  • GitHub Issues: Report bugs and request features
  • Discussions: Share results and ask questions
  • Research Collaboration: Contact the authors for academic partnerships
  • Commercial Inquiries: Review MIT License terms and conditions

Final Thoughts

TRELLIS.2-4B pushes the boundaries of what's possible in 3D generative AI, combining cutting-edge research with practical usability. Whether you're building the next generation of 3D content tools, conducting academic research, or creating immersive experiences, TRELLIS.2 provides a powerful foundation for innovation in 3D generation.

The open-source nature of the project, combined with comprehensive documentation and pre-trained models, makes it accessible to a wide range of usersβ€”from researchers exploring new architectures to developers building production applications.

Start generating high-quality 3D assets today with TRELLIS.2-4B!


Last Updated: December 2025
Model Version: TRELLIS.2-4B
License: MIT License

TRELLIS.2-4B Complete Guide

Tags:
TRELLIS.2-4B
Microsoft Research
3D Generation
Image-to-3D
O-Voxel
3D AI
PBR Materials
3D Modeling
Computer Vision
3D Assets
Open Source
MIT License
Back to Blog
Last updated: December 18, 2025