LawMate Romania

Overview

LawMate Romania is a project focused on creating a Large Language Model (LLM) specialized in the Romanian legal domain. This model is designed to assist with various legal tasks by understanding and generating text based on Romanian legal documents. The project uses the Equall/Saul-7B-Instruct-v1 pre-trained model from Hugging Face's library, specifically fine-tuned on Romanian legal texts like the Constitution and the Education Law.

Repo Structure

documents/: Contains text documents used for training the model, including the Romanian Constitution and the Education Law.

training_ds/: Contains the dataset files generated from the text documents for training purposes.

env_llm.txt: Lists the dependencies and environment settings required to run the project.

main.py: The main script for training and evaluating the Large Language Model (LLM).

.gitignore: Specifies files and directories to be ignored by Git to keep the repository clean.

LawMate Romania/: Includes the chatbot script and screenshots demonstrating example interactions.

Steps to Replicate Results

Set Up the Environment:
- Ensure you have Python installed.
- Install the necessary dependencies by running:
```
pip install -r env_llm.txt
```
Prepare the PDF Files:
- Place your PDF files in the documents/ directory.
- The script will automatically extract and preprocess the text from these files.
Fine-Tune the Model:
- Run the main training script main.py to fine-tune the pre-trained LLM on the provided dataset.
Evaluate and Save the Model:
- After training, the script will evaluate the model's performance on the validation dataset.
- The fine-tuned model will be saved for future use.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

LawMate Romania

Overview

Repo Structure

Steps to Replicate Results

Files

README.md

Latest commit

History

README.md

File metadata and controls

LawMate Romania

Overview

Repo Structure

Steps to Replicate Results