OpenAI Fine Tune Korean

Fine Tune Korean

Export

Run Notebooks

idle

Contents

No cells yet

Add cells to see them here

이 노트북은 OpenAI의 gpt-oss (open‑weight) 모델을 한국 뉴스 문체 + 최신 대화체로 세밀 튜닝하는 방법을 한국어/영어 이중 언어로 제공합니다.
This notebook shows how to fine‑tune OpenAI's gpt-oss (open‑weight) models for Korean news style + modern chat tone, in Korean & English.

MXFP4 workflow clarifications · MXFP4 워크플로 정리

EN:

Training or fine-tuning directly in MXFP4 is not supported by public frameworks today.
Recommended path: train in BF16 (or QLoRA 4‑bit nf4) → merge LoRA → post‑training quantize to MXFP4 → save_pretrained() for deployment.
If you need an MXFP4 artifact, you must re‑quantize from BF16 after merging adapters. (Export utilities are evolving; if your toolchain already supports MXFP4 serialization, that’s ideal.)

KR:

현재 공개 프레임워크에서는 MXFP4로 직접 학습/파인튜닝이 지원되지 않습니다.
권장 경로: BF16(또는 QLoRA 4‑bit nf4)로 학습 → LoRA 병합 → 사후(MXFP4) 양자화 → 배포용으로 save_pretrained() 저장.
MXFP4 아티팩트가 필요하면, 어댑터 병합 후 BF16 → MXFP4 재양자화가 필요합니다. (직렬화 유틸은 진화 중이며, 툴체인에서 MXFP4 저장을 지원하면 가장 좋습니다.)

LoRA targets (MoE) · LoRA 타깃(MoE 포함)

EN:

Minimal config (fast, low VRAM): target attention only, e.g. ["q_proj","v_proj"].
MoE‑aware config (better domain adaptation, more VRAM/time): include expert projection layers in addition to attention.

	from peft import LoraConfig

TARGET_MODULES = ["q_proj", "v_proj"]  # baseline
MOE_TARGET_PARAMETERS = [
    # example expert layers; adjust indices to your model depth
    "mlp.experts.gate_up_proj",
    "mlp.experts.down_proj",
]

lora_cfg = LoraConfig(
    r=16, lora_alpha=32, lora_dropout=0.05,
    target_modules="all-linear",              # cover all linear layers
    target_parameters=MOE_TARGET_PARAMETERS,  # add expert projections
    bias="none", task_type="CAUSAL_LM",
)

Start with attention‑only; if KR domain fit is insufficient, enable MoE targets and re‑eval.

KR:

최소 구성(빠르고 VRAM 절약): ["q_proj","v_proj"] 등 어텐션만 적용.
MoE 인지 구성(도메인 적합성↑, 자원 소모↑): 어텐션에 전문가(Expert) 투영 레이어를 추가로 포함.
먼저 어텐션만으로 시도한 뒤, 한국어 도메인 적합성이 부족하면 MoE 타깃을 켜고 재평가하세요.

Contents · 목차

Goals & Scope · 목표 & 범위
Environment check · 환경 점검
설정값 · Config
패키지 설치 · Install Deps
데이터 소싱(한국형) · KR‑Context Data Sourcing
샘플 데이터 생성 · Create Sample Data
전처리(PIPA) & 스타일 라벨 · PII Scrubbing & Style Tags
데이터 로딩/포맷팅 · Load & Format
모델/토크나이저 로드 · Load Model & Tokenizer
Fine‑Tuning (LoRA/QLoRA) · 세밀 튜닝
9a) Data curation & splits
9b) Hyperparameters (r/alpha/dropout)
9c) Merge adapters (BF16)
9d) Save merged BF16 (save_pretrained)
9e) Export & Quantize (BF16 → MXFP4) · 내보내기 & 양자화
평가(뉴스/대화) · Evaluation (News/Chat)
Inference Prompt Templates · 추론 프롬프트 템플릿
최신성 유지 · Freshness Strategy
안전/컴플라이언스 · Safety & Compliance
문제해결 & 다음 단계 · Troubleshooting & Next Steps

⚙️ Training vs Quantization — What’s supported

Do: Train with BF16/FP16 or QLoRA; export merged weights.
Then: Quantize to MXFP4 for inference using provided conversion scripts/utilities.
Don’t: Attempt to run an end‑to‑end “train in MXFP4” pipeline — not supported today.

PII & Compliance Reminder: For KR data, follow your enterprise policy (mask RRN/phone/account IDs, remove emails) before training & logging. Keep train/val/test splits stratified by source and style tags.

🧪 MoE adapters (optional)

You can target MoE layers with adapters, but treat this as advanced/experimental. Start with attention projections first and validate KR benchmarks before expanding scope.

Note: Keep transformers, peft, accelerate, and trl at versions known to support BF16/4‑bit LoRA.
If you pin safetensors, remember that native MXFP4 serialization is not yet standardized; loaders may upcast internally.

🔎 Support Matrix — At a glance

Fine‑tuning precision: BF16/FP16 ✅ · QLoRA 4‑bit ✅ · MXFP4 FT ❌
Quantization target: MXFP4 ✅ (post‑training)
API FT (hosted) for OSS models: ❌
Open‑source FT (Transformers/TRL/PEFT): ✅
LoRA targets: q_proj, k_proj, v_proj, o_proj ✅; MoE expert adapters experimental ⚠️

0) Goals & Scope · 목표 & 범위

KR: 한국어 일반 뉴스 + 일상/상담 대화체에 최적화. style=news_headline|news_lead|news_body|kakao_casual|kakao_formal 제어.
EN: Optimize for Korean news writing and modern chat tone; control output via style tags above.
Stack: transformers, trl(SFTTrainer), peft(LoRA/QLoRA), datasets.
Hardware: Single/few GPUs (BF16 preferred). CPU/Mac for lightweight tests.

1) Environment check · 환경 점검

[9]

Python: 3.10.12 (main, May 27 2025, 17:12:29) [GCC 11.4.0]
OS/Platform: Linux-6.8.0-60-generic-x86_64-with-glibc2.35
CUDA_VISIBLE_DEVICES: 
Torch: 2.7.1+cu126 CUDA: True
GPU: NVIDIA H100 80GB HBM3

2) 설정값 · Config

[10]

Config ready.

3) 패키지 설치 · Install Deps

[11]

transformers: 4.55.3
accelerate: 1.10.0
datasets: 4.0.0
peft: not installed
trl: 0.21.0
bitsandbytes: not installed
sentencepiece: 0.2.1
vllm: 0.10.1
llama_cpp: 0.3.16
pip: 25.2
Install cells are commented. Un-comment in your environment.

4) 데이터 소싱(한국형) · KR‑Context Data Sourcing

공개 벤치마크(주제 분류/요약/QA) + 허용된 뉴스 API의 메타데이터(제목/요약/섹션) 중심으로 스타일 보정.
기사 원문 대량 재학습은 저작권/약관 이슈 → 메타데이터·공개 코퍼스 위주.
대화체는 합법 공개 코퍼스(반말/존댓말/이모티콘/축약어 라벨 포함) 우선.
PIPA: 주민번호/연락처/이메일/계좌 등 개인정보는 훈련 전/로그 전 스크러빙.

Prefer public KR benchmarks (topic classification / summarization / QA) and allowed news API metadata for style calibration.
Avoid mass training on news full texts due to license/ToS constraints; use metadata + open corpora.
For chat, use lawful open corpora with tone/emoji/informal‑formal annotations.
Scrub PII (phone, RRNs, emails, accounts) before training/logging.

5) 샘플 데이터 생성 · Create Sample Data

[12]

Created: data/news.jsonl, data/chat.jsonl

6) 전처리(PIPA) & 스타일 라벨 · PII Scrubbing & Style Tags

[13]

data/news.jsonl -> data/news_clean.jsonl | rows: 4, redacted_rows: 2, hits: {'[EMAIL]': 2, '[ACCOUNT]': 1, '[RRN]': 1, '[CITY]': 1}
data/chat.jsonl -> data/chat_clean.jsonl | rows: 3, redacted_rows: 1, hits: {'[PHONE]': 1}

7) 데이터 로딩/포맷팅 · Load & Format

[15]

Created: data/news_harmony.jsonl data/chat_harmony.jsonl

Generating news split: 0 examples [00:00, ? examples/s]

Generating chat split: 0 examples [00:00, ? examples/s]

{'train': 3, 'validation': 4}

8) 모델/토크나이저 로드 · Load Model & Tokenizer

[16]

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.json:   0%|          | 0.00/27.9M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/98.0 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

chat_template.jinja: 0.00B [00:00, ?B/s]

Map:   0%|          | 0/3 [00:00<?, ? examples/s]

Map:   0%|          | 0/4 [00:00<?, ? examples/s]

Tokenization done. train: 3 val: 4 example lens: [200006, 17360, 200008, 3575, 553, 17554, 162016, 11, 261, 4410, 6439, 2359] ...

9) Fine‑Tuning (LoRA/QLoRA) · 세밀 튜닝

9a) Data curation & splits

(See Section 7/8 for dataset prep; move relevant snippets here if needed.)

9b) Hyperparameters (r/alpha/dropout)

	# Example LoRA hyperparameters
LORA_R = 8
LORA_ALPHA = 16
LORA_DROPOUT = 0.05

9c) Merge adapters (BF16)

	# Example merge step (after training)
# model = PeftModel.from_pretrained(base_model, adapter_path)
# merged_model = model.merge_and_unload()

9d) Save merged BF16 (`save_pretrained`)

	# merged_model.save_pretrained(OUTPUT_DIR)

9e) Export & Quantize (BF16 → MXFP4) · 내보내기 & 양자화

EN (neutral, framework-agnostic):
Public libraries currently do not support training/fine‑tuning directly in MXFP4. The common pipeline is:

Train/SFT in BF16 (or QLoRA 4‑bit nf4).
Merge LoRA adapters into the base model (BF16).
Save the merged BF16 checkpoint with save_pretrained().
Post‑training quantize the merged BF16 tensors to MXFP4 using a vendor/toolchain‑provided packer.
Save/export the MXFP4 artifact (same shape as Hugging Face save_pretrained() output) for deployment/serving.

Notes:

If your serving stack supports LoRA at inference, you may skip merging and quantization and ship: base (MXFP4 or BF16) + LoRA adapters.

If your runtime requires merged MXFP4, you must run a BF16 → MXFP4 quantization step after merging adapters.

Keep tokenizer/config files aligned across BF16 and MXFP4 exports.

KR (중립적, 도구 비의존):
현재 공개 라이브러리는 MXFP4에서 직접 학습/파인튜닝을 지원하지 않습니다. 일반적인 파이프라인은 다음과 같습니다:

BF16(또는 QLoRA 4‑bit nf4)로 학습/파인튜닝
LoRA 어댑터 병합(BF16 기준)
save_pretrained()로 병합된 BF16 체크포인트 저장
벤더/툴체인에서 제공하는 양자화 도구로 BF16 → MXFP4 사후 양자화
배포/서빙용 MXFP4 아티팩트 저장/내보내기 (Hugging Face save_pretrained() 구조와 동일)

참고:

서빙에서 LoRA를 지원한다면, 병합·양자화를 생략하고 기저( MXFP4 또는 BF16 ) + LoRA 어댑터로 제공할 수 있습니다.

병합된 MXFP4가 필요한 런타임의 경우, 어댑터 병합 후 BF16 → MXFP4 재양자화 단계가 필요합니다.

tokenizer/config 파일은 BF16과 MXFP4 아티팩트 간에 일관되게 유지하세요.

[19]

Fine‑tuning skeleton ready. Un‑comment on your machine.

10) 평가(뉴스/대화) · Evaluation (News/Chat)

KR 지표 · KR Metrics

뉴스성: 주제 분류 적합도(F1), 요약 품질(ROUGE‑1/2/L), 독해 QA(EM/F1).
대화성: 자연성/맥락 유지, 경어/반말 전환 정확도, 이모티콘/축약어 적절성.

EN Notes

Use public KR benchmarks (e.g., topic classification, KorQuAD‑like QA) where licenses permit.
Mix automatic metrics (F1/ROUGE) with human eval for tone & politeness.

[20]

Eval stubs ready.

11) Inference Prompt Templates · 추론 프롬프트 템플릿

[25]

<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-08-21

Reasoning: medium

# Valid channels: analysis, commentary, final. Channel must be included for every message.<|end|><|start|>developer<|message|># Instructions

너는 한국 고객을 돕는 유능한 AI 어시스턴트다.

<|end|><|start|>user<|message|>국내 PIPA 규정을 준수하면서 사내 문서 요약기를 구성하려면 어떤 아키텍처가 좋을까?<|end|><|start|>assistant

12) 최신성 유지 · Freshness Strategy

주간 보정 SFT: 허용된 뉴스 API 메타데이터(제목/요약/섹션) 샘플링 → 스타일 보정.
대화체 업데이트: 최신 축약어/신조어/이모티콘 사전 반영(예: ㄱㄱ, ㅇㅋ, ㅋㅋ, ㄹㅇ).
회귀 평가: 동일 지표로 before/after 비교 → 혼합비/온도/패널티 튜닝.
Weekly calibration SFT using allowed news API metadata for style;
Update slang/emoji lexicons;
Regression evals to track drift and adjust data mix/decoding.

13) 안전/컴플라이언스 · Safety & Compliance

데이터 출처/라이선스 확인(벤치마크, API, 내부 데이터) · Verify dataset/API licenses.
개인정보 스크러빙(훈련/로그/평가 전) · Scrub PII before training/logging/eval.
출력 검증(스키마/금칙어/민감도 규칙) · Output validation & forbidden‑term filters.
버전/평가 리포트 관리 · Version datasets/models and keep eval reports.

14) 문제해결 & 다음 단계 · Troubleshooting & Next Steps

혼합 비율 튜닝: (뉴스:대화) 6:4 → 7:3 또는 5:5로 조정
LoRA 하이퍼파라미터: r=8~~16, α=16~~32, dropout=0.05~0.1
서비스화: vLLM/llama.cpp 서빙 + 토픽/스타일 라우팅
RAG 결합: 최신 사실성 보강을 위해 뉴스/문서 인덱스 결합
A/B 테스트: 톤/길이/이모티콘 사용량 등 사용자 만족도 측정
Tune mix ratios, run A/B tests, consider vLLM serving, and pair with RAG for factuality.