# indic-transformers-hi-bert **Repository Path**: modelee/indic-transformers-hi-bert ## Basic Information - **Project Name**: indic-transformers-hi-bert - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 3 - **Forks**: 0 - **Created**: 2023-05-24 - **Last Updated**: 2025-05-26 ## Categories & Tags **Categories**: llm **Tags**: None ## README --- language: - hi tags: - MaskedLM - Hindi - BERT - Question-Answering - Token Classification - Text Classification --- # Indic-Transformers Hindi BERT ## Model description This is a BERT language model pre-trained on ~3 GB of monolingual training corpus. The pre-training data was majorly taken from [OSCAR](https://oscar-corpus.com/). This model can be fine-tuned on various downstream tasks like text-classification, POS-tagging, question-answering, etc. Embeddings from this model can also be used for feature-based training. ## Intended uses & limitations #### How to use ``` from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained('neuralspace-reverie/indic-transformers-hi-bert') model = AutoModel.from_pretrained('neuralspace-reverie/indic-transformers-hi-bert') text = "आपका स्वागत हैं" input_ids = tokenizer(text, return_tensors='pt')['input_ids'] out = model(input_ids)[0] print(out.shape) # out = [1, 5, 768] ``` #### Limitations and bias The original language model has been trained using `PyTorch` and hence the use of `pytorch_model.bin` weights file is recommended. The h5 file for `Tensorflow` has been generated manually by commands suggested [here](https://huggingface.co/transformers/model_sharing.html).