2023

  • AidData’s 5-digit Coding Project
    • Paper coauthored by me and my collegue David Zhu on the Application of FLAN UL2 Model on OECD’s Health Sector Data
    • We built the vector database in Pinecone to store all the embeddings (using SBERT) for health sector data from 1980 to 2020, implemented the whole framework for performing few-shot learning with dynamic retrieval of in-context examples
    • Improved the model accuracy from 88% to 97% by implementing dynamic retrieval of in-context examples; reducedannotation cost for sectors without training data by implementing fast vote-k selection of annotation
  • TRIP (Teaching, Research & International Policy)
    • Accomplished prompt engineering on GPT4 API for the autocoding of professor Mike Tierney’s TRIP project
    • Refer to the role-playing framework proposed by KAUST
  • DisinfoLab
    • Build a robust synthetic media detector as an AI platform to analyze and flag artificially and digitally generatedmedia
    • Fully trained two models with high accuracy on the VideoSham dataset; use Meta’s Segment Anything to improve model performance
    • Integrate Weights & Biases MLOps into BA-TFD+ to track and manage training andtesting

2022

  • DeepLearning Certificate
    • Neural Networks and Deep Learning
    • Natural Language Processing Specialization