Hugginface(embedding, reranker)&Ollama(llm) 연동 trouble shooting #3

New Issue

kyy · 2025-03-25T10:46:22+09:00

kyy commented

2025-03-25 10:46:22 +09:00

📌 Ollama LLM 및 HuggingFace Embedding&Re-ranker 모델 연동

1. 개요

사내 문서를 활용하여 평가 데이터를 생성하고 RAG(Retrieval-Augmented Generation) 파이프라인을 검증하기 위해 AutoRAG를 사용했다.

그러나 Ollama와 **HuggingFace 기반 모델(HF Embedding, Re-ranker)**을 추가하는 과정에서 예상치 못한 오류가 발생했고, 이를 해결하기 위해 AutoRAG의 구조를 분석하고 코드를 수정하였다.

이에 대한 이슈 분석 및 해결 과정을 공유한다.

2. 이슈

AutoRAG는 기본적으로 OpenAI API와 일부 사전 정의된 모델만을 지원한다. 이에 따라 사용자가 직접 Ollama 기반 LLM과 HuggingFace 임베딩 및 리랭커 모델을 명시적으로 등록해줘야 했다.

📌 발생한 주요 오류

(1) Ollama 모델 인식 불가

Error: ConnectionError: Failed to connect to Ollama. Please check that Ollama is downloaded, running and accessible.

원인: AutoRAG는 기본적으로 OpenAI 등의 LLM을 지원하며, Ollama는 직접 등록해주지 않으면 인식하지 못한다.
해결 방향: Ollama를 AutoRAG의 generator_models에 등록해주는 작업이 필요하다.

(2) Ollama 서버 연결 오류 (`ReadTimeout`)

httpx.ReadTimeout: ReadTimeout

원인: Ollama 서버 주소(base_url)가 잘못되었거나 Docker 컨테이너 간 네트워크 설정이 올바르지 않은 경우 발생할 수 있다.
해결 방향: Docker Compose에서 지정한 컨테이너 이름(autorag-ollama)을 기반으로 base_url을 정확히 명시해야 한다.

(3) Embedding 모델 문제

TypeError: 'NoneType' object is not callable

원인: HuggingFace 임베딩 모델을 등록하지 않았기 때문에 embedding_models 딕셔너리에서 찾을 수 없음.
해결 방향: embedding_models 딕셔너리에 HuggingFace 모델을 직접 등록.

(4) Re-ranker 모델 문제

원인: 리랭커(reranker)가 명시적으로 정의되지 않아 평가 과정에서 실패.
해결 방향: 사용자 정의 Re-ranker를 작성하고 등록 필요.

3. 해결 과정

(1) `main.py`에서 Ollama 모델 등록

Ollama 모델을 AutoRAG에서 사용할 수 있도록 LazyInit을 활용하여 generator_models에 추가했다. 이때, base_url은 Ollama docker 주소로 정의한다.

📌 수정된 main.py

import autorag
from llama_index.llms.ollama import Ollama

OLLAMA_BASE_URL = "autorag-ollama:11434"

autorag.generator_models["ollama"] = autorag.LazyInit(
    Ollama, 
    base_url=OLLAMA_BASE_URL,   # Docker Compose에서 정의한 서비스 이름
    model="llama3",             # Ollama에서 로딩한 모델명 (예: llama3)
    request_timeout=120,        # 긴 요청 처리 시간 고려
    num_gpus=1                  # GPU 자원 사용 수 지정
)

📝 설명:

LazyInit을 통해 모델을 지연 초기화하여 실제 사용할 때까지 리소스를 아끼며 로드함.
autorag-ollama는 docker-compose.yml에서 정의한 컨테이너 이름이므로 반드시 일치시켜야 통신 가능.

(2) `embedding/base.py`에서 HuggingFace 임베딩 모델 등록

AutoRAG가 HuggingFace Embedding 모델을 인식할 수 있도록 embedding_models에 추가했다.

📌 수정된 embedding/base.py

embedding_models = {
    # llama index
    "openai": LazyInit(
        OpenAIEmbedding
    ),  # default model is OpenAIEmbeddingModelType.TEXT_EMBED_ADA_002
    "openai_embed_3_large": LazyInit(
        OpenAIEmbedding, model_name=OpenAIEmbeddingModelType.TEXT_EMBED_3_LARGE
    ),
    ... 생략 ...
}

try:
    # you can use your own model in this way.
    from llama_index.embeddings.huggingface import HuggingFaceEmbedding
    
    embedding_models["huggingface_KoE5"] = LazyInit(
        HuggingFaceEmbedding, model_name="nlpai-lab/KoE5"
    ) # 230313 추가 - 김용연

📝 설명:

HuggingFaceEmbedding 클래스를 통해 HuggingFace의 사전학습 임베딩 모델 사용 가능.
KoE5는 한국어에 최적화된 encoder-only 모델로, RAG의 문서 임베딩 성능을 향상시킬 수 있음.
LazyInit을 통해 필요한 시점에만 모델을 불러오게 구성.

(3) `nodes/passagereranker/`에서 HuggingFace 리랭커 모델 정의

새로운 re-ranker 모델을 추가하기 위해서는 별도의 py 파일 작성이 필요했다.

📌 수정된 __init__.py

from .dragonkue2 import DragonKue2 # 250313 추가 - 김용연

📌 신규 작성된 draongKue2.py

class DragonKue2(BasePassageReranker):
    def __init__(self, project_dir: str, *args, **kwargs):
        super().__init__(project_dir)
        try:
            import torch
            from transformers import AutoModelForSequenceClassification, AutoTokenizer
        except ImportError:
            raise ImportError("For using dragonkue2, please install torch first.")

        model_path = "dragonkue/bge-reranker-v2-m3-ko"
        self.tokenizer = AutoTokenizer.from_pretrained(model_path)
        self.model = AutoModelForSequenceClassification.from_pretrained(model_path)
        self.model.eval()
        # Determine the device to run the model on (GPU if available, otherwise CPU)
        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        self.model.to(self.device)
... 생략 ...
    def dragonku2_run_model(input_texts, model, tokenizer, device, batch_size: int):
        ...
        with torch.no_grad():
            scores = model(**inputs).logits.view(-1).float()

📝 설명:

HuggingFace 모델 dragonkue/bge-reranker-v2-m3-ko는 한국어 문장 쌍에 대한 의미적 유사도를 기반으로 재정렬을 수행함.
BasePassageReranker를 상속받아 AutoRAG의 평가 파이프라인에 통합.
pure() → _pure() 구조를 통해 query와 candidate 간 유사도 평가 및 상위 k개 선택.

📝 dragonku2_run_model() 설명:

입력 문장쌍 리스트를 받아 배치 단위로 추론.
로짓 점수를 기반으로 유사도 판단 후 softmax-like 정규화를 거쳐 반환.
exp_normalize() 함수를 통해 softmax 안정성 보장 (overflow 방지).

# **📌 Ollama LLM 및 HuggingFace Embedding&Re-ranker 모델 연동** ## **1. 개요** 사내 문서를 활용하여 평가 데이터를 생성하고 **RAG(Retrieval-Augmented Generation) 파이프라인을 검증**하기 위해 **AutoRAG**를 사용했다. 그러나 **Ollama**와 **HuggingFace 기반 모델(HF Embedding, Re-ranker)**을 추가하는 과정에서 예상치 못한 오류가 발생했고, 이를 해결하기 위해 AutoRAG의 구조를 분석하고 코드를 수정하였다. 이에 대한 **이슈 분석 및 해결 과정**을 공유한다. ## **2. 이슈** AutoRAG는 기본적으로 OpenAI API와 일부 사전 정의된 모델만을 지원한다. 이에 따라 **사용자가 직접 Ollama 기반 LLM과 HuggingFace 임베딩 및 리랭커 모델을 명시적으로 등록**해줘야 했다. ### 📌 **발생한 주요 오류** ### **(1) Ollama 모델 인식 불가** ```bash Error: ConnectionError: Failed to connect to Ollama. Please check that Ollama is downloaded, running and accessible. ``` - **원인**: AutoRAG는 기본적으로 OpenAI 등의 LLM을 지원하며, Ollama는 직접 등록해주지 않으면 인식하지 못한다. - **해결 방향**: Ollama를 AutoRAG의 `generator_models`에 등록해주는 작업이 필요하다. ### **(2) Ollama 서버 연결 오류 (`ReadTimeout`)** ```bash httpx.ReadTimeout: ReadTimeout ``` - **원인**: `Ollama` 서버 주소(`base_url`)가 잘못되었거나 Docker 컨테이너 간 네트워크 설정이 올바르지 않은 경우 발생할 수 있다. - **해결 방향**: Docker Compose에서 지정한 컨테이너 이름(`autorag-ollama`)을 기반으로 `base_url`을 정확히 명시해야 한다. ### **(3) Embedding 모델 문제** ```bash TypeError: 'NoneType' object is not callable ``` - **원인**: HuggingFace 임베딩 모델을 등록하지 않았기 때문에 `embedding_models` 딕셔너리에서 찾을 수 없음. - **해결 방향**: `embedding_models` 딕셔너리에 HuggingFace 모델을 직접 등록. ### (4) Re-ranker 모델 문제 - **원인**: 리랭커(reranker)가 명시적으로 정의되지 않아 평가 과정에서 실패. - **해결 방향**: 사용자 정의 Re-ranker를 작성하고 등록 필요. ## **3. 해결 과정** ### (1) `main.py`에서 Ollama 모델 등록 Ollama 모델을 AutoRAG에서 사용할 수 있도록 `LazyInit`을 활용하여 `generator_models`에 추가했다. 이때, `base_url`은 Ollama docker 주소로 정의한다. 📌 **수정된 `main.py`** ```python import autorag from llama_index.llms.ollama import Ollama OLLAMA_BASE_URL = "autorag-ollama:11434" autorag.generator_models["ollama"] = autorag.LazyInit( Ollama, base_url=OLLAMA_BASE_URL, # Docker Compose에서 정의한 서비스 이름 model="llama3", # Ollama에서 로딩한 모델명 (예: llama3) request_timeout=120, # 긴 요청 처리 시간 고려 num_gpus=1 # GPU 자원 사용 수 지정 ) ``` 📝 **설명**: - `LazyInit`을 통해 모델을 지연 초기화하여 실제 사용할 때까지 리소스를 아끼며 로드함. - `autorag-ollama`는 `docker-compose.yml`에서 정의한 컨테이너 이름이므로 반드시 일치시켜야 통신 가능. ### **(2) `embedding/base.py`에서 HuggingFace 임베딩 모델 등록** AutoRAG가 HuggingFace Embedding 모델을 인식할 수 있도록 `embedding_models`에 추가했다. 📌 **수정된 `embedding/base.py`** ```python embedding_models = { # llama index "openai": LazyInit( OpenAIEmbedding ), # default model is OpenAIEmbeddingModelType.TEXT_EMBED_ADA_002 "openai_embed_3_large": LazyInit( OpenAIEmbedding, model_name=OpenAIEmbeddingModelType.TEXT_EMBED_3_LARGE ), ... 생략 ... } try: # you can use your own model in this way. from llama_index.embeddings.huggingface import HuggingFaceEmbedding embedding_models["huggingface_KoE5"] = LazyInit( HuggingFaceEmbedding, model_name="nlpai-lab/KoE5" ) # 230313 추가 - 김용연 ``` 📝 **설명**: - `HuggingFaceEmbedding` 클래스를 통해 HuggingFace의 사전학습 임베딩 모델 사용 가능. - `KoE5`는 한국어에 최적화된 encoder-only 모델로, RAG의 문서 임베딩 성능을 향상시킬 수 있음. - `LazyInit`을 통해 필요한 시점에만 모델을 불러오게 구성. --- ### **(3) `nodes/passagereranker/`에서 HuggingFace 리랭커 모델 정의** 새로운 re-ranker 모델을 추가하기 위해서는 별도의 py 파일 작성이 필요했다. 📌 **수정된 `__init__.py`** ```bash from .dragonkue2 import DragonKue2 # 250313 추가 - 김용연 ``` 📌 **신규 작성된 `draongKue2.py`** ```bash class DragonKue2(BasePassageReranker): def __init__(self, project_dir: str, *args, **kwargs): super().__init__(project_dir) try: import torch from transformers import AutoModelForSequenceClassification, AutoTokenizer except ImportError: raise ImportError("For using dragonkue2, please install torch first.") model_path = "dragonkue/bge-reranker-v2-m3-ko" self.tokenizer = AutoTokenizer.from_pretrained(model_path) self.model = AutoModelForSequenceClassification.from_pretrained(model_path) self.model.eval() # Determine the device to run the model on (GPU if available, otherwise CPU) self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu") self.model.to(self.device) ... 생략 ... def dragonku2_run_model(input_texts, model, tokenizer, device, batch_size: int): ... with torch.no_grad(): scores = model(**inputs).logits.view(-1).float() ``` 📝 **설명**: - HuggingFace 모델 `dragonkue/bge-reranker-v2-m3-ko`는 한국어 문장 쌍에 대한 의미적 유사도를 기반으로 재정렬을 수행함. - `BasePassageReranker`를 상속받아 AutoRAG의 평가 파이프라인에 통합. - `pure()` → `_pure()` 구조를 통해 query와 candidate 간 유사도 평가 및 상위 k개 선택. 📝 **dragonku2_run_model() 설명**: - 입력 문장쌍 리스트를 받아 배치 단위로 추론. - 로짓 점수를 기반으로 유사도 판단 후 softmax-like 정규화를 거쳐 반환. - `exp_normalize()` 함수를 통해 softmax 안정성 보장 (overflow 방지).

kyy self-assigned this 2025-03-25 10:46:22 +09:00

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: kyy/autorag_evaluation#3

Hugginface(embedding, reranker)&Ollama(llm) 연동 trouble shooting #3

📌 Ollama LLM 및 HuggingFace Embedding&Re-ranker 모델 연동

1. 개요

2. 이슈

📌 발생한 주요 오류

(1) Ollama 모델 인식 불가

(2) Ollama 서버 연결 오류 (ReadTimeout)

(3) Embedding 모델 문제

(4) Re-ranker 모델 문제

3. 해결 과정

(1) main.py에서 Ollama 모델 등록

(2) embedding/base.py에서 HuggingFace 임베딩 모델 등록

(3) nodes/passagereranker/에서 HuggingFace 리랭커 모델 정의

(2) Ollama 서버 연결 오류 (`ReadTimeout`)

(1) `main.py`에서 Ollama 모델 등록

(2) `embedding/base.py`에서 HuggingFace 임베딩 모델 등록

(3) `nodes/passagereranker/`에서 HuggingFace 리랭커 모델 정의