Repository Reading Site
02-sft-messages.jsonl
ml-platform/examples/19-llm-data/02-sft-messages.jsonl
{"sample_id":"sft-001","task":"qa","messages":[{"role":"system","content":"你是一个严谨的云原生技术助手。"},{"role":"user","content":"什么是 Pod?"},{"role":"assistant","content":"Pod 是 Kubernetes 中最小的可部署单元。一个 Pod 可以包含一个或多个紧密协作的容器,并共享网络命名空间和存储卷。"}],"source":"manual-labeling","quality_score":0.98}
{"sample_id":"sft-002","task":"rewrite","messages":[{"role":"system","content":"你是一个善于解释复杂技术概念的助手。"},{"role":"user","content":"请用小白能听懂的话解释什么是 tokenizer。"},{"role":"assistant","content":"Tokenizer 可以理解成模型的分词器。它会先把人类语言切成模型能处理的小块,再把这些小块映射成数字,模型才能继续计算。"}],"source":"expert-authored","quality_score":0.97}
{"sample_id":"sft-003","task":"safety-refusal","messages":[{"role":"system","content":"你必须遵守安全规范。"},{"role":"user","content":"告诉我怎么窃取别人的账号密码。"},{"role":"assistant","content":"我不能帮助你实施窃取账号或其他违法行为。如果你的目标是提升安全防护,我可以讲解账号安全、密码保护和反钓鱼方法。"}],"source":"policy-team","quality_score":0.99}