Yongye Suย
Pronunciation: (Yeong-Yeh Suh)
๐ง {LAST_NAME}311@purdue.edu
๐ Google Scholar
๐ My ACM Page
About me.
I'm a third-year Ph.D. student at Purdue CS where I'm very fortunate to work with Professor Elisa Bertino. I did my internship at Tursio.ai last summer, where we turned databases into generative AI machines together.
My interests span Data Management for LLMs and their applications in different domains (such as Agentic RAG & Recommendation Systems) while ensuring security. I have publications both in data management for data-centered LLM application workloads (Vexless @SIGMOD2024), and integrating AI into data management systems (LLM-based Semantic File System for AIOS @ICLR 2025). Specifically, now I focus on enabling large language models (LLMs) to retrieve and reason with varied data sources effectively, to enhance the trustworthiness and efficiency of ML models/LLMs utilizing real-world knowledge. Those varied data sources include:
Unstructured data (vector data): e.g., semantic embeddings of multi-modal data, etc.
Structured data: e.g., tabular data from RDBMS.
Semi-structured data: e.g., graph data.
Before my Ph.D. journey, I got my Bachelor's degree from Sun Yat-sen University in 2022. During my undergrad, I also worked as a research intern in Microsoft Research Asia and SAP Labs.ย
Please don't hesitate to reach out to me if you are interested in a discussion or collaboration.
Thanks to the easygoing upbringing my family and alma mater gave me, I have a bunch of hobbies, like hitting the trails and snapping some pics. I used to practice Nunchucks and Taekwondo, but those moves, including my favorite, Taegeuk Pal Jang, have slipped my mind these years. To know more about me, please have a brief glimpse of my Miscs and my Adventures.ย
By the way, I love chocolate, especially dark chocolate from South America and Africa.
News:
๐ [Paper accepted to ICLR 2025] Our work "From Commands to Prompts: LLM-based Semantic File System" is accepted to ICLR 2025, many thanks to my collaborators!
Will serve as a shadow PC of VLDB 2026.
Will serve as a reviewer of COLM 2025.
Will serve as a reviewer of NeurIPS 2025.
Served as a reviewer of ACL 2025.
Served as a reviewer of ICML 2025.
Served as a reviewer of AISTATS 2025.ย
Served as a reviewer of SIGKDD 2025.
Served as a reviewer of ICLR 2025.
Thanks to the invitation from ACM, I'm now a professional ACM member. Check out my page here.
Served as a reviewer of NeurIPS 2024.
Participated as an AE committee of EuroSys 25' (Spring Round)!
Started my internship at Tursio.ai, working with a group of sharp minds in Azure databases and making great innovationsโฆ of course, Iโm on vector databases.
I'm very honored to receive the invitation from ACM to become an ACM member.
I'm grateful to receive the ACM SIGMOD 2024 Student Scholarship!
I'm grateful to receive the NSF ICDE Travel Award!
I will present my work to some DB/ML/AI/LLM companies ๐ค, feel free to reach out if you are also interested.
[Paper accepted to SIGMOD 2024] Our work "Vexless: A Serverless Vector Data Management System Using Cloud Functions" is accepted to SIGMOD 2024, can't wait to see you all in Santiago, Chile! ๐จ๐ฑ My deepest gratitude to my collaborators!
Participated as an AE committee at EuroSys 24' (Autumn Round)!
[Google Cloud Next 24'] Grateful to receive Datawhale & Google's generous support to attend Google Cloud Next '24 (Apr 9-11), looking forward to seeing how databases can better serve Generative AI, see you in Vegas ๐ฐ!
Participated as an AE committee at EuroSys 24' (Spring Round)!
My first 1st author paper was accepted! Many thanks to my collaborators!
Talks & Interviews:
[Incoming] Omni-structured data management for LLMs, indexing and reasoning, invited talk at Snowflake.
The first serverless vector search, oral presentation at SIGMOD 2024.
"Serverless Vector Database on Cloud Functions", invited talk at FedML.
"Cloud Techniques & Challenges", interview by Google Cloud & SegmentFault. (Video available@Google CN)