Skip to main content

CurateClick

github.com favicon

大模型RLHF(ppo奖励模型)训练偏好数据排序助手

一个支持ollama本地模型的工具,用于对大模型生成回答进行排序,生成偏好数据以训练奖励模型,进而优化SFT模型。

Check out 大模型RLHF(ppo奖励模型)训练偏好数据排序助手 on CurateClick

• Copy the embed code to showcase this product on your website

• Share on X to spread the word about this amazing tool

描述:简单来说就是你经过sft微调后,想通过RLHF(ppo奖励模型)训练怎么样的模型,就给你的模型生成回答进行排序,最后再导出偏好数据去训练奖励模型,再用奖励模型去训练sft模型开源地址:https://github.com/zjrwtx/preference_databuilder演示视频地址:https://www.bilibili.com/video/BV1P1421675z/?spm_id_from=333.999.0.0&vd_source=5531fb0981ef79f87198a3c2651dff93

Latest Weekly Picks

promptbuilder.cc favicon
Weekly Pick

Generate, optimize, test, and manage AI prompts in one place. Turn an idea into a ready-to-use prompt in seconds.

Prompt Builder
freqz.net favicon
Weekly Pick

Your Personal K-Style Vibe Photo Studio

Freqz
webleadr.com favicon
Weekly Pick

Find and contact web design leads and businesses without websites, like local dentists, effortlessly from any location in just a few clicks.

Webleadr

the art of having something to say

pretty-scale.com favicon
Weekly Pick

How Pretty Are You? Let AI Decide.

Pretty Scale
c2story.com favicon
Weekly Pick

Create Illustrated Stories with AI

C2story

Impromptu Speech Topics & Timer

Seedance 2.0: Multimodal AI video creator. Precisely direct characters and cameras to create production-ready, multi-shot cinematic stories in one click.