Chúc mừng nhóm sinh viên, học viên cao học khoa HTTT có bài báo tại Hội nghị Quốc tế về Hệ thống thông minh và Khoa học dữ liệu (ISDC 2025)
Following the success of the first ISDS 2023 organized at Can Tho University, the second ISDS 2024 organized at Nha Trang University, This year, the ISDS 2025 will be held at CICT, Can Tho University. Objectives of this international conference is to attract domestic and foreign researchers to participate and present outstanding and recent research in the field of ICT. This is an opportunity for scientists to meet, exchange, and cooperate. The ISDS 2025 is also a place for students to report and learn new results in the field of ICT. This ISDS conference looks at state-of-the-art and original research issues (in the topics of intelligent systems and data science).
- Topics of the conference relate to (but not limited to):
- Track 1: Intelligent Systems & Recommender Systems
- Track 2: Data Science & Machine Learing
- Track 3: Image Processing & Pattern Recognition
- Track 4: Natural Language Processing
- ISDS 2025 will be held at College of ICT, Can Tho University from 18-19-Oct-2025
Link hội nghị: https://isds.ctu.edu.vn/2025/
Tên bài báo: “Enhancing Text-to-SQL Capabilities of Small Language Models via Schema Context Enrichment and Self-Correction”
Nhóm sinh viên, học viên thực hiện:
- 21522255, Lê Gia Kiệt, HTCL2021
- 21520283, Lê Quốc Khánh, HTCL2021
- 220104018, Nguyễn Minh Nhựt, HVCH HTTT
Giảng viên hướng dẫn: PGS. TS. Nguyễn Đình Thuân
Abstract:Translating natural language into SQL is essential for intuitive database access, yet open-source small language models (SLMs) still lag behind larger systems when faced with complex schemas and tight context windows. This paper introduces a two-phase workflow designed to enhance the Text-to-SQL capabilities of SLMs. Phase 1 (offline) transforms the database schema into a graph, partitions it with Louvain community detection, and enriches each component in a cluster with metadata, relationships, and sample rows. Phase 2 (at runtime) selects the relevant tables, generates SQL queries, and iteratively refines the SQL through an execution-driven feedback loop until the query executes successfully. Evaluated on the Spider test set, our pipeline raises Qwen-2.5-Coder-14B to 86.2% Execution Accuracy (EX), surpassing its zero-shot baseline and outperforming all contemporary SLM + ICL approaches and narrowing the gap to GPT-4-based systems all while running on consumer-grade hardware. Ablation studies confirm that both schema enrichment and self-correction contribute significantly to the improvement. The study concludes that this workflow provides a practical methodology for deploying resource-efficient open-source SLMs in Text-to-SQL applications, effectively mitigating common challenges. An open-source implementation is released to support further research.











