Repository Reading Site
00-raw-corpus.jsonl
ml-platform/examples/19-llm-data/00-raw-corpus.jsonl
{"doc_id":"raw-001","source":"public-web","lang":"zh","license":"unknown","content":"<html><body><h1>Kubernetes 入门</h1><p>Kubernetes 是一个用于自动部署、扩缩容和管理容器化应用的平台。</p><script>alert('ad')</script></body></html>"}
{"doc_id":"raw-002","source":"internal-wiki","lang":"zh","license":"internal","content":"工单系统 FAQ:客户张三手机号 13812345678 反馈 ingress 访问 502,需要排查 upstream 是否健康。"}
{"doc_id":"raw-003","source":"forum-export","lang":"zh","license":"unknown","content":"今天讲一下 LoRA!!!真的很牛!!!关注我领取资料!!!"}
{"doc_id":"raw-004","source":"code-repo","lang":"en","license":"apache-2.0","content":"def add(a, b):\n return a + b\n# TODO: write unit test"}