# indic-transformers-hi-bert

**Repository Path**: modelee/indic-transformers-hi-bert

## Basic Information

- **Project Name**: indic-transformers-hi-bert
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 3
- **Forks**: 0
- **Created**: 2023-05-24
- **Last Updated**: 2025-05-26

## Categories & Tags

**Categories**: llm

**Tags**: None

## README

---
language: 
- hi 
tags:
- MaskedLM
- Hindi
- BERT
- Question-Answering
- Token Classification
- Text Classification
---
# Indic-Transformers Hindi BERT
## Model description
This is a BERT language model pre-trained on ~3 GB of monolingual training corpus. The pre-training data was majorly taken from [OSCAR](https://oscar-corpus.com/).
This model can be fine-tuned on various downstream tasks like text-classification, POS-tagging, question-answering, etc. Embeddings from this model can also be used for feature-based training.
## Intended uses & limitations
#### How to use
```
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained('neuralspace-reverie/indic-transformers-hi-bert')
model = AutoModel.from_pretrained('neuralspace-reverie/indic-transformers-hi-bert')
text = "आपका स्वागत हैं"
input_ids = tokenizer(text, return_tensors='pt')['input_ids']
out = model(input_ids)[0]
print(out.shape)
# out = [1, 5, 768] 
```
#### Limitations and bias
The original language model has been trained using `PyTorch` and hence the use of `pytorch_model.bin` weights file is recommended. The h5 file for `Tensorflow` has been generated manually by commands suggested [here](https://huggingface.co/transformers/model_sharing.html).