Skip to main content

Sora Watermark Remover - Allows you to remove the watermark from Sora videos.Try Now

CurateClick

github.com favicon

大模型RLHF(ppo奖励模型)训练偏好数据排序助手

一个支持ollama本地模型的工具,用于对大模型生成回答进行排序,生成偏好数据以训练奖励模型,进而优化SFT模型。

Check out 大模型RLHF(ppo奖励模型)训练偏好数据排序助手 on CurateClick

• Copy the embed code to showcase this product on your website

• Share on X to spread the word about this amazing tool

描述:简单来说就是你经过sft微调后,想通过RLHF(ppo奖励模型)训练怎么样的模型,就给你的模型生成回答进行排序,最后再导出偏好数据去训练奖励模型,再用奖励模型去训练sft模型开源地址:https://github.com/zjrwtx/preference_databuilder演示视频地址:https://www.bilibili.com/video/BV1P1421675z/?spm_id_from=333.999.0.0&vd_source=5531fb0981ef79f87198a3c2651dff93

Latest Weekly Picks

Impromptu Speech Topics & Timer

Seedance 2.0: Multimodal AI video creator. Precisely direct characters and cameras to create production-ready, multi-shot cinematic stories in one click.

valrequest.net favicon
Weekly Pick

Turn Feelings Into Words

ValRequest

One‑stop AI video creation suite for image‑to‑video and more.

banana-ai.org favicon
Weekly Pick

Nano Banana AI is an AI photo tool for image generation and editing, enabling fast text-prompt edits, style transformations, and photorealistic results.

Nano Banana AI

Try Different Hairstyles Instantly

seedance2.ai favicon
Weekly Pick

Seedance 2 Delivers High Fidelity Motion and Audio Sync

Seedance 2
nanobananaimg.com favicon
Weekly Pick

Nano Banana AI Image Generator for Professional Image Creation with Nano Banana Pro Models

Nano Banana