hi, i am prithiv! i am a graduate engineer [ug 2024], information technology, gcee focused on working in llm training and enhancements, improving multimodal ai capabilities.
Pinned Loading
-
OCR-ReportLab-Notebooks
OCR-ReportLab-Notebooks PublicA dedicated Colab notebooks to experiment (Nanonets OCR, Monkey OCR, OCRFlux 3B, Typhoo OCR 3B & more..) On T4 GPU - free tier
-
Multimodal-Outpost-Notebooks
Multimodal-Outpost-Notebooks PublicThis repository contains a curated collection of notebooks for implementing state-of-the-art multimodal Vision-Language Models (VLMs).
Jupyter Notebook 4
-
Flux-LoRA-DLC
Flux-LoRA-DLC PublicExperience the power of the FLUX.1-dev diffusion model combined with a massive collection of 255+ community-created LoRAs! This Gradio application provides an easy-to-use interface to explore diver…
-
Qwen2.5-VL-Video-Understanding
Qwen2.5-VL-Video-Understanding PublicThe Qwen2.5-VL-7B-Instruct model is a multimodal AI model developed by Alibaba Cloud that excels at understanding both text and images. It's a Vision-Language Model (VLM) designed to handle various…
Python 3
-
FineTuning-SigLIP-2
FineTuning-SigLIP-2 PublicFine-Tuning SigLIP 2 for Single/Multi-Label Image Classification. Image classification vision-language encoder model fine-tuned for Image Classification Tasks
If the problem persists, check the GitHub status page or contact support.