TRELLIS.2-4B: The Complete Guide to Microsoft's Revolutionary 3D Generation Model (2025)
π― Core Highlights (TL;DR)
- TRELLIS.2-4B is Microsoft's state-of-the-art 4 billion parameter model for high-fidelity image-to-3D generation
- Introduces O-Voxel (Omni-Voxel), a breakthrough "field-free" representation handling arbitrary topologies including open surfaces and non-manifold geometry
- Achieves ultra-fast generation: 3 seconds for 512Β³ resolution, 17 seconds for 1024Β³ on NVIDIA H100
- Supports full PBR materials (Base Color, Metallic, Roughness, Opacity) for photorealistic rendering
- Compresses 1024Β³ assets into only ~9.6K latent tokens with negligible quality loss
- Open-source under MIT License with complete training code and 500K dataset
Table of Contents
- What is TRELLIS.2-4B?
- Evolution from TRELLIS to TRELLIS.2
- Technical Innovations
- Key Features and Capabilities
- Performance Benchmarks
- Installation and Setup
- How to Use TRELLIS.2-4B
- Comparison with Other 3D Generation Models
- Training Your Own Model
- Limitations and Considerations
- Frequently Asked Questions
- Conclusion and Next Steps
What is TRELLIS.2-4B?
TRELLIS.2-4B is Microsoft Research's latest breakthrough in 3D generative AI, representing a significant leap forward in image-to-3D conversion technology. As a 4 billion parameter model, it transforms single 2D images into fully textured, high-resolution 3D assets with unprecedented quality and speed.
Core Capabilities
- Input: Single RGB image
- Output: Fully textured 3D mesh with PBR materials
- Resolution: Supports 512Β³ to 1536Β³ voxel grid resolution
- Speed: 3-60 seconds depending on resolution (NVIDIA H100)
- License: MIT License (open-source)
π‘ Key Innovation
Unlike traditional methods that rely on implicit fields (SDF, NeRF) or iso-surface representations (Flexicubes), TRELLIS.2 uses a novel "field-free" approach that natively handles complex geometries without lossy conversions.
Research Background
Developed by a collaborative team from Tsinghua University and Microsoft Research, TRELLIS.2 builds upon the original TRELLIS model (CVPR'25 Spotlight) with fundamental architectural improvements. The research paper is available at arXiv:2512.14692.
Evolution from TRELLIS to TRELLIS.2
TRELLIS (First Generation)
The original TRELLIS introduced the concept of Structured LATent (SLAT) representation, enabling:
- Multiple output formats (Radiance Fields, 3D Gaussians, Meshes)
- Models up to 2B parameters
- Training on 500K diverse 3D objects
| Model | Parameters | Key Feature |
|---|---|---|
| TRELLIS-image-large | 1.2B | Image-to-3D generation |
| TRELLIS-text-base | 342M | Text-to-3D (base) |
| TRELLIS-text-large | 1.1B | Text-to-3D (large) |
| TRELLIS-text-xlarge | 2.0B | Text-to-3D (extra-large) |
TRELLIS.2 Breakthrough
TRELLIS.2 represents a paradigm shift with:
β
Native topology handling - No conversion artifacts
β
Compact latent space - 16Γ spatial compression
β
Instant processing - Rendering-free, optimization-free
β
Full PBR support - Including transparency/translucency
β
Higher resolution - Up to 1536Β³ voxel grids
Technical Innovations
1. O-Voxel: Omni-Voxel Representation
O-Voxel is the cornerstone innovation of TRELLIS.2, representing a "field-free" sparse voxel structure that simultaneously encodes geometry and appearance.
Geometry Component (f_shape)
- Flexible Dual Grids: Handles arbitrary topologies
- Sharp Edge Preservation: Maintains geometric details
- Topology Freedom: Supports open surfaces, non-manifold geometry, internal structures
Appearance Component (f_mat)
- Base Color: RGB texture information
- Metallic: Material reflectivity
- Roughness: Surface smoothness
- Alpha: Transparency/translucency support
β οΈ Technical Advantage
Traditional iso-surface methods (SDF, Flexicubes) struggle with:
- Open surfaces (e.g., cloth, hair)
- Non-manifold geometry (e.g., intersecting surfaces)
- Internal structures (e.g., hollow objects)
O-Voxel handles all these cases natively without conversion artifacts.
2. SC-VAE: Sparse Compression VAE
The Sparse Compression 3D VAE employs a Sparse Residual Autoencoding scheme to achieve unprecedented compression ratios.
| Resolution | Latent Tokens | Compression Ratio |
|---|---|---|
| 512Β³ | ~2.4K | 64Γ spatial |
| 1024Β³ | ~9.6K | 16Γ spatial |
| 1536Β³ | ~21.6K | 7Γ spatial |
Key Features:
- Negligible perceptual degradation
- Efficient large-scale generative modeling
- Direct voxel compression without intermediate representations
3. Flow-Matching Transformer Architecture
TRELLIS.2-4B utilizes vanilla DiT (Diffusion Transformer) architecture with:
- 4 billion parameters
- Flow-matching training objective
- Efficient attention mechanisms for sparse data
- Multi-resolution training strategy
4. Instant Bidirectional Conversion
One of TRELLIS.2's most practical innovations is the ability to convert between meshes and O-Voxels instantly:
| Direction | Time (Single CPU) | Time (CUDA) |
|---|---|---|
| Mesh β O-Voxel | < 10 seconds | < 100ms |
| O-Voxel β Mesh | < 10 seconds | < 100ms |
This enables:
- Rendering-free processing: No need for multi-view rendering
- Optimization-free workflow: Direct conversion without iterative refinement
- Minimalist pipeline: Simplified data preparation and post-processing
Key Features and Capabilities
High Quality and Resolution
TRELLIS.2-4B generates assets with exceptional fidelity across multiple resolutions:
π Generation Quality Metrics
Resolution: 512Β³
- Generation Time: 3 seconds (2s shape + 1s material)
- Detail Level: High
- Use Case: Rapid prototyping, real-time applications
Resolution: 1024Β³
- Generation Time: 17 seconds (10s shape + 7s material)
- Detail Level: Very High
- Use Case: Production assets, game development
Resolution: 1536Β³
- Generation Time: 60 seconds (35s shape + 25s material)
- Detail Level: Ultra High
- Use Case: Film production, high-end visualization
Arbitrary Topology Handling
Unlike traditional methods constrained by iso-surface representations, TRELLIS.2 robustly handles:
β Open Surfaces
- Cloth, curtains, flags
- Hair and fur
- Thin structures
β Non-manifold Geometry
- Intersecting surfaces
- Self-intersections
- Complex architectural elements
β Internal Structures
- Hollow objects
- Multi-layer constructions
- Enclosed cavities
Rich Texture Modeling with PBR
Full Physically Based Rendering (PBR) support enables photorealistic relighting:
| Material Property | Description | Use Case |
|---|---|---|
| Base Color | RGB albedo texture | Surface appearance |
| Metallic | Metal vs. dielectric | Material type classification |
| Roughness | Surface smoothness | Specular reflection control |
| Opacity (Alpha) | Transparency level | Glass, water, translucent materials |
β Best Practice
The PBR material system is compatible with standard game engines (Unity, Unreal Engine) and 3D software (Blender, Maya), enabling seamless integration into production pipelines.
Shape-Conditioned Texture Generation
TRELLIS.2 supports two generation modes:
- Image-to-3D: Generate complete 3D asset from single image
- Texture Generation: Generate textures for existing 3D meshes with reference image
This flexibility allows:
- Re-texturing existing assets
- Style transfer to 3D models
- Texture variation generation
Performance Benchmarks
Generation Speed (NVIDIA H100)
| Resolution | Shape Generation | Material Generation | Total Time |
|---|---|---|---|
| 512Β³ | 2 seconds | 1 second | 3 seconds |
| 1024Β³ | 10 seconds | 7 seconds | 17 seconds |
| 1536Β³ | 35 seconds | 25 seconds | 60 seconds |
Latent Space Efficiency
TRELLIS.2 achieves state-of-the-art compression while maintaining quality:
Reconstruction Accuracy vs. Latent Compactness
TRELLIS.2: ββββββββββββββββββββ (Highest)
Method A: ββββββββββββββββββββ
Method B: ββββββββββββββββββββ
Method C: ββββββββββββββββββββ
Compactness (tokens per 1024Β³):
TRELLIS.2: ~9.6K
Method A: ~25K
Method B: ~40K
Method C: ~60K
Hardware Requirements
| Component | Minimum | Recommended |
|---|---|---|
| GPU Memory | 24GB | 48GB+ |
| GPU Model | NVIDIA A100 | NVIDIA H100 |
| System RAM | 32GB | 64GB+ |
| CUDA Version | 12.4+ | 12.4+ |
| OS | Linux | Linux (Ubuntu 20.04+) |
Installation and Setup
Prerequisites
Before installing TRELLIS.2, ensure your system meets these requirements:
- Operating System: Linux (tested on Ubuntu 20.04+)
- GPU: NVIDIA GPU with 24GB+ VRAM
- CUDA Toolkit: Version 12.4 or higher
- Python: Version 3.8 or higher
- Conda: For dependency management
β οΈ Windows Users
While primarily tested on Linux, Windows setup is possible but not officially supported. Refer to community discussions for Windows-specific configurations.
Step-by-Step Installation
1. Clone the Repository
git clone --recurse-submodules https://github.com/microsoft/TRELLIS.2.git cd TRELLIS.2
2. Create Conda Environment
# Create new environment conda create -n trellis2 python=3.10 conda activate trellis2
3. Install Dependencies
The installation script provides modular dependency installation:
# Install all dependencies for inference . ./setup.sh --new-env --basic --xformers --flash-attn \ --diffoctreerast --spconv --mipgaussian --kaolin --nvdiffrast
Installation Flags Explained:
| Flag | Purpose |
|---|---|
--new-env | Create new conda environment named 'trellis2' |
--basic | Install core dependencies |
--xformers | Memory-efficient attention (for GPUs without flash-attn) |
--flash-attn | Fast attention implementation (recommended) |
--diffoctreerast | Differentiable octree rasterizer |
--spconv | Sparse convolution operations |
--mipgaussian | Mip-splatting for Gaussian rendering |
--kaolin | NVIDIA's 3D deep learning library |
--nvdiffrast | Differentiable rasterizer |
4. Environment Configuration
Set environment variables for optimal performance:
export OPENCV_IO_ENABLE_OPENEXR=1 export PYTORCH_CUDA_ALLOC_CONF="expandable_segments:True" # For GPUs without flash-attn support (e.g., V100) # export ATTN_BACKEND=xformers # SPCONV algorithm selection export SPCONV_ALGO=native # Use 'auto' for benchmarking (slower first run)
5. Download Pre-trained Models
Models are automatically downloaded from Hugging Face on first use, or download manually:
# Models will be cached in ~/.cache/huggingface/ # No manual download required for basic usage
Troubleshooting Installation
Issue: CUDA version mismatch
# Check CUDA version nvcc --version # Set correct CUDA path export PATH=/usr/local/cuda-12.4/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda-12.4/lib64:$LD_LIBRARY_PATH
Issue: Out of memory during compilation
# Limit parallel compilation jobs export MAX_JOBS=4
How to Use TRELLIS.2-4B
Basic Image-to-3D Generation
Here's a minimal example to generate a 3D asset from an image:
import os os.environ['OPENCV_IO_ENABLE_OPENEXR'] = '1' os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True" import cv2 import imageio from PIL import Image import torch from trellis2.pipelines import Trellis2ImageTo3DPipeline from trellis2.utils import render_utils from trellis2.renderers import EnvMap import o_voxel # 1. Setup Environment Map for PBR rendering envmap = EnvMap(torch.tensor( cv2.cvtColor(cv2.imread('assets/hdri/forest.exr', cv2.IMREAD_UNCHANGED), cv2.COLOR_BGR2RGB), dtype=torch.float32, device='cuda' )) # 2. Load Pipeline pipeline = Trellis2ImageTo3DPipeline.from_pretrained("microsoft/TRELLIS.2-4B") pipeline.cuda() # 3. Load Image & Run Generation image = Image.open("assets/example_image/T.png") mesh = pipeline.run(image)[0] # 4. Simplify mesh (nvdiffrast has 16M triangle limit) mesh.simplify(16777216) # 5. Render Video Preview video = render_utils.make_pbr_vis_frames( render_utils.render_video(mesh, envmap=envmap) ) imageio.mimsave("output.mp4", video, fps=15) # 6. Export to GLB format glb = o_voxel.postprocess.to_glb( vertices = mesh.vertices, faces = mesh.faces, attr_volume = mesh.attrs, coords = mesh.coords, attr_layout = mesh.layout, voxel_size = mesh.voxel_size, aabb = [[-0.5, -0.5, -0.5], [0.5, 0.5, 0.5]], decimation_target = 1000000, texture_size = 4096, remesh = True, remesh_band = 1, remesh_project = 0, verbose = True ) glb.export("output.glb", extension_webp=True)
Advanced Usage: Multi-Resolution Generation
Generate assets at different resolutions based on your needs:
# High-speed generation (512Β³) mesh_fast = pipeline.run( image, resolution=512, seed=42 )[0] # Balanced quality (1024Β³) - Default mesh_balanced = pipeline.run( image, resolution=1024, seed=42 )[0] # Maximum quality (1536Β³) mesh_ultra = pipeline.run( image, resolution=1536, seed=42 )[0]
Shape-Conditioned Texture Generation
Generate textures for existing 3D meshes:
from trellis2.pipelines import Trellis2TextureGenerationPipeline # Load texture generation pipeline texture_pipeline = Trellis2TextureGenerationPipeline.from_pretrained( "microsoft/TRELLIS.2-4B" ) texture_pipeline.cuda() # Load existing mesh and reference image input_mesh = o_voxel.io.load_mesh("input_model.obj") reference_image = Image.open("texture_reference.png") # Generate texture textured_mesh = texture_pipeline.run( mesh=input_mesh, image=reference_image, seed=42 )[0]
Batch Processing
Process multiple images efficiently:
import glob from pathlib import Path # Load pipeline once pipeline = Trellis2ImageTo3DPipeline.from_pretrained("microsoft/TRELLIS.2-4B") pipeline.cuda() # Process all images in directory image_paths = glob.glob("input_images/*.png") for img_path in image_paths: image = Image.open(img_path) mesh = pipeline.run(image)[0] # Save with same filename output_name = Path(img_path).stem glb = o_voxel.postprocess.to_glb( vertices=mesh.vertices, faces=mesh.faces, attr_volume=mesh.attrs, coords=mesh.coords, attr_layout=mesh.layout, voxel_size=mesh.voxel_size, aabb=[[-0.5, -0.5, -0.5], [0.5, 0.5, 0.5]], decimation_target=1000000, texture_size=4096 ) glb.export(f"output/{output_name}.glb", extension_webp=True)
Output Formats
TRELLIS.2 supports multiple output formats:
| Format | Extension | Use Case |
|---|---|---|
| GLB | .glb | Web, game engines, general 3D software |
| PLY | .ply | Point cloud, Gaussian splatting |
| OBJ | .obj | Traditional 3D modeling software |
| GLTF | .gltf | Web applications, AR/VR |
# Export to different formats mesh.save_ply("output.ply") # Gaussian representation mesh.save_obj("output.obj") # Traditional mesh glb.export("output.glb") # Optimized for web/games
Comparison with Other 3D Generation Models
TRELLIS.2 vs. Original TRELLIS
| Feature | TRELLIS (v1) | TRELLIS.2 |
|---|---|---|
| Representation | SLAT (Structured Latent) | O-Voxel (Omni-Voxel) |
| Topology Support | Limited (iso-surface based) | Arbitrary (field-free) |
| Max Resolution | 1024Β³ | 1536Β³ |
| Latent Tokens (1024Β³) | ~25K | ~9.6K |
| PBR Materials | Partial | Full (including alpha) |
| Processing Pipeline | Multi-stage rendering | Instant conversion |
| Open Surfaces | β | β |
| Non-manifold Geometry | β | β |
| Internal Structures | β | β |
TRELLIS.2 vs. Other State-of-the-Art Models
| Model | Parameters | Speed (1024Β³) | Topology | PBR Support |
|---|---|---|---|---|
| TRELLIS.2-4B | 4B | 17s | Arbitrary | Full |
| Shap-E | 300M | ~30s | Limited | Partial |
| Point-E | 1B | ~45s | Limited | No |
| DreamFusion | - | ~2 hours | Limited | Partial |
| Magic3D | - | ~40 min | Limited | Partial |
| Instant3D | 2B | ~25s | Limited | Partial |
β Competitive Advantage
TRELLIS.2's combination of speed, quality, and topology flexibility makes it the most versatile solution for production-ready 3D asset generation.
When to Use TRELLIS.2
Best Use Cases:
- Production asset creation for games and films
- Rapid prototyping and concept visualization
- E-commerce 3D product visualization
- AR/VR content creation
- Architectural visualization
- Digital twin creation
Consider Alternatives When:
- You need text-only input (use TRELLIS-text models)
- You require real-time generation on mobile devices
- You need extremely high polygon counts (>10M triangles)
- You're working with specific artistic styles (may need fine-tuning)
Training Your Own Model
TRELLIS.2 provides complete training code for researchers and developers who want to:
- Fine-tune on custom datasets
- Experiment with architecture modifications
- Train domain-specific models
Training Dataset: TRELLIS-500K
Microsoft provides TRELLIS-500K, a curated dataset containing 500,000 high-quality 3D assets from:
| Source | Assets | Description |
|---|---|---|
| Objaverse(XL) | ~350K | Diverse everyday objects |
| ABO | ~50K | Amazon product catalog |
| 3D-FUTURE | ~40K | Furniture and interior design |
| HSSD | ~40K | Habitat synthetic scenes |
| Toys4k | ~20K | Toy objects |
All assets are filtered based on aesthetic scores and quality metrics.
Training Pipeline Overview
The training process follows a multi-stage approach:
Stage 1: VAE Training
βββ Sparse Structure VAE (ss_vae)
βββ SLat VAE with Decoders (slat_vae)
βββ Gaussian Decoder
βββ Radiance Field Decoder
βββ Mesh Decoder
Stage 2: Flow Model Training
βββ Sparse Structure Flow (ss_flow)
βββ SLat Flow (slat_flow)
Training Configuration
Example configurations are provided in the configs/ directory:
VAE Training:
# Train Sparse Structure VAE python train.py \ --config configs/vae/ss_vae_conv3d_16l8_fp16.json \ --output_dir outputs/ss_vae \ --data_dir /path/to/TRELLIS-500K \ --num_gpus 8 # Train SLat VAE with Gaussian Decoder python train.py \ --config configs/vae/slat_vae_enc_dec_gs_swin8_B_64l8_fp16.json \ --output_dir outputs/slat_vae_gs \ --data_dir /path/to/TRELLIS-500K \ --num_gpus 8
Flow Model Training:
# Train Image-conditioned Flow Model python train.py \ --config configs/generation/slat_flow_img_dit_L_64l8p2_fp16.json \ --output_dir outputs/slat_flow_img \ --data_dir /path/to/TRELLIS-500K \ --num_nodes 4 \ --num_gpus 8 \ --master_addr $MASTER_ADDR \ --master_port $MASTER_PORT
Multi-Node Distributed Training
For large-scale training across multiple machines:
# Node 0 (Master) python train.py \ --config configs/generation/slat_flow_img_dit_L_64l8p2_fp16.json \ --output_dir outputs/slat_flow_img_distributed \ --data_dir /path/to/TRELLIS-500K \ --num_nodes 4 \ --node_rank 0 \ --num_gpus 8 \ --master_addr 192.168.1.100 \ --master_port 29500 # Node 1 python train.py \ --config configs/generation/slat_flow_img_dit_L_64l8p2_fp16.json \ --output_dir outputs/slat_flow_img_distributed \ --data_dir /path/to/TRELLIS-500K \ --num_nodes 4 \ --node_rank 1 \ --num_gpus 8 \ --master_addr 192.168.1.100 \ --master_port 29500 # Repeat for nodes 2 and 3...
Fine-tuning on Custom Data
To fine-tune TRELLIS.2 on your own dataset:
- Prepare Data: Convert your 3D assets to O-Voxel format
- Configure Training: Modify config files for your dataset
- Resume from Checkpoint: Load pre-trained weights
python train.py \ --config configs/generation/slat_flow_img_dit_L_64l8p2_fp16.json \ --output_dir outputs/custom_finetuned \ --data_dir /path/to/custom/dataset \ --load_dir microsoft/TRELLIS.2-4B \ --num_gpus 8
Training Hardware Requirements
| Model Size | Recommended GPUs | Training Time (500K dataset) |
|---|---|---|
| Base (342M) | 8Γ A100 (40GB) | ~1 week |
| Large (1.1B) | 16Γ A100 (40GB) | ~2 weeks |
| XLarge (4B) | 32Γ A100 (80GB) | ~4 weeks |
Limitations and Considerations
Known Limitations
1. Geometric Artifacts
Issue: Generated meshes may occasionally contain small holes or minor topological discontinuities.
Impact:
- Affects applications requiring watertight geometry (3D printing, simulation)
- More common in high-complexity models with intricate details
Mitigation:
# Use provided post-processing scripts from trellis2.utils import mesh_repair cleaned_mesh = mesh_repair.fill_holes(mesh, max_hole_size=100) cleaned_mesh = mesh_repair.remove_degenerate_faces(cleaned_mesh)
2. Base Model Without Alignment
Issue: TRELLIS.2-4B is a pre-trained foundation model without human preference alignment (RLHF).
Impact:
- Output style reflects training data distribution
- May require multiple generations to achieve desired aesthetic
- Not optimized for specific artistic styles
Recommendations:
- Generate multiple variants with different seeds
- Use post-processing for style refinement
- Consider fine-tuning for specific aesthetic requirements
3. Input Image Quality Dependency
Issue: Output quality heavily depends on input image characteristics.
Best Practices:
- Use high-resolution images (512Γ512 minimum, 1024Γ1024 recommended)
- Ensure clear object visibility with minimal occlusion
- Prefer images with good lighting and contrast
- Avoid heavily compressed or noisy images
4. Memory Requirements
Issue: High-resolution generation requires substantial GPU memory.
| Resolution | Minimum VRAM | Recommended VRAM |
|---|---|---|
| 512Β³ | 16GB | 24GB |
| 1024Β³ | 24GB | 40GB |
| 1536Β³ | 40GB | 80GB |
Memory Optimization:
# Enable memory-efficient settings os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True" # Use gradient checkpointing during inference pipeline.enable_memory_efficient_attention()
Responsible AI Considerations
β οΈ Important Notice
TRELLIS.2 is a research project. Responsible AI considerations were factored into all stages:
- Dataset Curation: Public datasets reviewed for harmful content and PII
- Potential Bias: Internet-sourced data may contain inherent biases
- Intended Use: Academic and research purposes only
- Commercial Use: Requires careful evaluation of generated content
Ethical Guidelines:
- Do not generate content that infringes intellectual property rights
- Avoid creating misleading or deceptive 3D representations
- Respect privacy and consent when generating assets based on real objects
- Consider cultural sensitivity in generated content
Performance Considerations
Factors Affecting Generation Quality:
-
Input Image Characteristics
- Resolution and clarity
- Lighting conditions
- Object visibility and occlusion
- Background complexity
-
Generation Parameters
- Resolution setting (512Β³ vs 1024Β³ vs 1536Β³)
- Random seed selection
- Sampling steps (if configurable)
-
Hardware Configuration
- GPU model and memory
- CUDA version compatibility
- Driver version
Frequently Asked Questions
Q: What is the difference between TRELLIS and TRELLIS.2?
A: TRELLIS.2 represents a fundamental architectural upgrade from the original TRELLIS. The key differences are:
- Representation: TRELLIS uses SLAT (Structured Latent), while TRELLIS.2 uses O-Voxel (Omni-Voxel), a "field-free" approach
- Topology: TRELLIS.2 natively handles arbitrary topologies including open surfaces and non-manifold geometry, which TRELLIS cannot
- Efficiency: TRELLIS.2 compresses 1024Β³ assets into ~9.6K tokens vs. ~25K in TRELLIS
- Materials: TRELLIS.2 supports full PBR including transparency, while TRELLIS has partial support
- Processing: TRELLIS.2 offers instant mesh conversion (<100ms with CUDA) vs. multi-stage rendering in TRELLIS
Q: Can I use TRELLIS.2 for commercial projects?
A: Yes, TRELLIS.2 is released under the MIT License, which permits commercial use. However:
- Verify that generated assets don't infringe on existing intellectual property
- The model is a base model without alignment, so output quality may vary
- Some submodules may have different licenses (check the LICENSE file)
- Consider the ethical implications of AI-generated content in your use case
Q: What GPU do I need to run TRELLIS.2?
A: Minimum requirements:
- GPU: NVIDIA GPU with 24GB VRAM (e.g., RTX 3090, A5000, A100)
- Resolution: 512Β³ requires 16GB, 1024Β³ requires 24GB, 1536Β³ requires 40GB+
- Tested on: NVIDIA A100 and H100 GPUs
- Not supported: AMD GPUs, Apple Silicon (MPS), CPU-only inference
For optimal performance, use NVIDIA H100 or A100 (80GB) GPUs.
Q: How do I improve generation quality?
A: Follow these best practices:
-
Input Image Quality:
- Use high-resolution images (1024Γ1024 recommended)
- Ensure good lighting and clear object visibility
- Remove complex backgrounds if possible
- Avoid heavily compressed or noisy images
-
Generation Settings:
- Use higher resolution (1024Β³ or 1536Β³) for detailed assets
- Try different random seeds (generate 3-5 variants)
- Experiment with different input angles if available
-
Post-Processing:
- Use mesh repair tools for geometric artifacts
- Apply texture enhancement in 3D software
- Optimize topology for your specific use case
Q: Can TRELLIS.2 generate 3D assets from text prompts?
A: TRELLIS.2-4B is specifically designed for image-to-3D generation. For text-to-3D, you have two options:
-
Two-stage approach (Recommended):
- Use a text-to-image model (DALL-E, Midjourney, Stable Diffusion)
- Feed generated image to TRELLIS.2-4B
- This typically produces better results
-
Use TRELLIS text models:
- TRELLIS-text-base (342M)
- TRELLIS-text-large (1.1B)
- TRELLIS-text-xlarge (2.0B)
- Note: These are from TRELLIS v1, not TRELLIS.2
Q: How long does generation take?
A: Generation time depends on resolution and hardware:
On NVIDIA H100:
- 512Β³: ~3 seconds (2s shape + 1s material)
- 1024Β³: ~17 seconds (10s shape + 7s material)
- 1536Β³: ~60 seconds (35s shape + 25s material)
On NVIDIA A100 (40GB):
- 512Β³: ~5 seconds
- 1024Β³: ~30 seconds
- 1536Β³: ~120 seconds
Older GPUs (RTX 3090, A6000) will be proportionally slower.
Q: What output formats are supported?
A: TRELLIS.2 supports multiple industry-standard formats:
- GLB/GLTF: Optimized for web, game engines (Unity, Unreal), and AR/VR
- PLY: Point cloud format, useful for Gaussian splatting
- OBJ: Traditional mesh format for 3D modeling software
- Mesh with PBR: Full material properties (Base Color, Metallic, Roughness, Alpha)
All formats include full PBR material information where applicable.
Q: Can I train TRELLIS.2 on my own dataset?
A: Yes, the complete training code is provided. You can:
- Fine-tune the pre-trained model on your custom dataset
- Train from scratch if you have sufficient data (100K+ assets recommended)
- Modify architecture for research purposes
Requirements:
- Convert your 3D assets to O-Voxel format using provided tools
- Minimum 8Γ NVIDIA A100 GPUs for fine-tuning
- 32Γ A100 GPUs for full training of 4B model
- Training time: 1-4 weeks depending on model size
Q: Does TRELLIS.2 work on Windows?
A: TRELLIS.2 is primarily developed and tested on Linux (Ubuntu 20.04+). Windows support is:
- Not officially supported by the development team
- Possible with community workarounds (see GitHub issues)
- Recommended approach: Use WSL2 (Windows Subsystem for Linux) with GPU passthrough
For production use, Linux is strongly recommended.
Q: How does TRELLIS.2 handle transparent or translucent objects?
A: TRELLIS.2 has native support for transparency through the Alpha channel in its PBR material system:
- Opacity/Alpha attribute is part of the O-Voxel representation
- Supports both binary transparency (glass) and gradient translucency (smoke, water)
- Exports correctly to GLB format with alpha channel preserved
- Compatible with standard rendering engines that support PBR
This is a significant advantage over methods that only support opaque surfaces.
Q: What is the TRELLIS-500K dataset?
A: TRELLIS-500K is the training dataset for TRELLIS.2, containing:
- 500,000 curated 3D assets from multiple sources
- Filtered based on aesthetic scores and quality metrics
- Includes diverse categories: objects, furniture, toys, architectural elements
- Publicly available for research purposes
- Comes with data preparation toolkits for processing custom assets
Sources: Objaverse(XL), ABO, 3D-FUTURE, HSSD, Toys4k
Conclusion and Next Steps
Summary
TRELLIS.2-4B represents a significant breakthrough in 3D generative AI, offering:
β
Unmatched Versatility: Handles arbitrary topologies including open surfaces, non-manifold geometry, and internal structures
β
Exceptional Efficiency: 3-60 second generation time with compact 9.6K token representation
β
Production-Ready Quality: Full PBR materials with photorealistic rendering capabilities
β
Open Research: MIT License with complete training code and 500K dataset
β
Minimalist Pipeline: Instant, optimization-free mesh conversion
Getting Started Checklist
- Verify hardware requirements (24GB+ NVIDIA GPU)
- Install CUDA Toolkit 12.4+
- Clone repository and install dependencies
- Download or prepare test images
- Run basic image-to-3D generation example
- Experiment with different resolutions and settings
- Export to GLB format for use in your pipeline
Recommended Next Steps
For Researchers:
- Explore the technical paper: arXiv:2512.14692
- Download TRELLIS-500K dataset for analysis
- Experiment with architecture modifications
- Benchmark against your own methods
For Developers:
- Integrate TRELLIS.2 into your 3D content pipeline
- Build applications using the API
- Optimize for your specific hardware configuration
- Contribute to the open-source project
For Artists and Designers:
- Test with various input images to understand capabilities
- Develop workflows combining text-to-image and TRELLIS.2
- Experiment with post-processing in 3D software
- Share results and feedback with the community
Resources and Links
| Resource | Link |
|---|---|
| Official Repository | github.com/microsoft/TRELLIS.2 |
| Research Paper | arxiv.org/abs/2512.14692 |
| Project Page | microsoft.github.io/TRELLIS.2 |
| Model Hub | huggingface.co/microsoft/TRELLIS.2-4B |
| Dataset | TRELLIS-500K Documentation |
| Original TRELLIS | github.com/microsoft/TRELLIS |
Community and Support
- GitHub Issues: Report bugs and request features
- Discussions: Share results and ask questions
- Research Collaboration: Contact the authors for academic partnerships
- Commercial Inquiries: Review MIT License terms and conditions
Final Thoughts
TRELLIS.2-4B pushes the boundaries of what's possible in 3D generative AI, combining cutting-edge research with practical usability. Whether you're building the next generation of 3D content tools, conducting academic research, or creating immersive experiences, TRELLIS.2 provides a powerful foundation for innovation in 3D generation.
The open-source nature of the project, combined with comprehensive documentation and pre-trained models, makes it accessible to a wide range of usersβfrom researchers exploring new architectures to developers building production applications.
Start generating high-quality 3D assets today with TRELLIS.2-4B!
Last Updated: December 2025
Model Version: TRELLIS.2-4B
License: MIT License