2023
- AidData’s 5-digit Coding Project
- Paper coauthored by me and my collegue David Zhu on the Application of FLAN UL2 Model on OECD’s Health Sector Data
- We built the vector database in Pinecone to store all the embeddings (using SBERT) for health sector data from 1980 to 2020, implemented the whole framework for performing few-shot learning with dynamic retrieval of in-context examples
- Improved the model accuracy from 88% to 97% by implementing dynamic retrieval of in-context examples; reducedannotation cost for sectors without training data by implementing fast vote-k selection of annotation
- TRIP (Teaching, Research & International Policy)
- Accomplished prompt engineering on GPT4 API for the autocoding of professor Mike Tierney’s TRIP project
- Refer to the role-playing framework proposed by KAUST
- DisinfoLab
- Build a robust synthetic media detector as an AI platform to analyze and flag artificially and digitally generatedmedia
- Fully trained two models with high accuracy on the VideoSham dataset; use Meta’s Segment Anything to improve model performance
- Integrate Weights & Biases MLOps into BA-TFD+ to track and manage training andtesting
2022
- DeepLearning Certificate
- Neural Networks and Deep Learning
- Natural Language Processing Specialization