Large Language Models (LLMs) are increasingly popular due to their ability to complete a wide range of tasks. However, assessing their output quality remains a challenge, especially for complex tasks where there is no standard metric. Fine-tuning LLMs on large datasets for specific tasks may be a potential solution to improve their efficacy and accuracy. In this article, we explore the potential ways to assess LLM output quality: