I have modified the model_inference script for a faster inference using vllm. I want to contribute it here. I have also modified the naming convention to make it more aligned with HF repo ids.