My cattle brand, "D reverse E connected"
About Research Photography Climbing Contact

Recent Applied Research and Projects

 

Neural Combinatorial Optimization for Scheduling

Combinatorial optimization (CO) is a long studied field of mathematics that seeks to find the optimal item in a set. However, as problem size grows, the number of items in the set explodes (i.e., grows combinatorially). Many famous problems exist within this field, perhaps most notably the Travelling Salesman Problem; correspondingly, many methods exist to find solutions to these challenging problems. However, these solutions seek exact results and are extremely slow - to address this, a new field called Neural Combinatorial Optimization (NCO) began to use machine learning models to find approximate or sufficient solutions to these challenging problems much more quickly than existing solvers. In this project, we utilized NCO to create schedules where constraints need to be frequently added or removed, making traditional CO solvers infeasible. Additionally, many of the required constraints for the scheduling are extremely challenging to form as a mathematical constraint, making NCO even more attractive since constraints in NCO do not need to be set up as a mathematical equation to incorporate.
Open the dropdown below to see additional details of the research.

Statistical Significance Testing for Model Comparison

The practice of directly comparing machine learning models on a certain test set, then reporting performance as a scalar-valued measure of goodness is so ubiquitous that it can be found in almost every single AI/ML paper from the last 30 years. Many authors attempt to demonstrate variability in their measurements by methods such as cross-validation, or calculating the average/median and standard deviation over randomized trials. However, these methods (and other common methods of determining uncertainty in results) usually don't provide evidence of statistical significance between the proposed model and the existing model. To address this, we utilized statistical techniques to determine if performance differences between models are statistically significant or not. These tests were performed on the overall results and the by-category results, and across models that may have differing categories.
Open the dropdown below to see additional details of the research.

Detecting Out-of-Distribution Instances

Machine learning models usually make the assumption that any future test data the model encounters will be distributed similarly to the model's training data. This makes the task of determining what is in-distribution (ID) vs out-of-distribution (OOD) an important task, since predictions from a model on OOD data could result in poor performance. Often OOD data is split into two categories: covariate and semantic (see Wang et al for more details). In this research, we determined there was an appreciable performance disparity in CV models when the testing set underwent covariate shift (e.g. day vs night images), and in NLP models when the testing data underwent semantic shift (e.g. Wikipedia data vs Twitter data). To detect both covariate and semantic shift, we developed a simple sliding window technique that utilized various summary statistics to raise an "alarm bell" when new data a model is encountering may be OOD, warning users that predictions may be unreliable, and that model retraining may be merited.
Open the dropdown below to see additional details of the research.


Academic Research

Fair Kernel Methods

[ICTAI'19] Austin Okray, Hui Hu and Chao Lan. Fair Kernel Regression via Fair Feature Embedding in Kernel Space

Abstract:
In recent years, there have been significant efforts on mitigating unethical demographic biases in machine learning methods. However, very little work is done for kernel methods. In this paper, we propose a novel fair kernel regression method via fair feature embedding (FKR-F2E) in kernel space. Motivated by prior works feature processing for fair learning and feature selection for kernel methods, we propose to learn fair feature embeddings in kernel space, where the demographic discrepancy of feature distributions is minimized. Through experiments on three public real-world data sets, we show the proposed FKR- F2E achieves significantly lower prediction disparity compared with the state-of-the-art fair kernel regression method and several other baseline methods.

Music Classification

[IJCNN'19] Zhen Wang, Suresh Muknahallipatna, Maohong Fan, Austin Okray and Chao Lan. Music Classification using an Improved CRNN with Multi-Directional Spatial Dependencies in Both Time and Frequency Dimensions

Abstract:
In music classification tasks, Convolutional Recurrent Neural Network (CRNN) has achieved state-of-the-art performance on several data sets. However, the current CRNN technique only uses RNN to extract spatial dependency of music signal in its time dimension but not its frequency dimension. We hypothesize the latter can be additionally exploited to improve classification performance. In this paper, we propose an improved technique called CRNN in Time and Frequency dimensions (CRNN-TF), which captures spatial dependencies of music signal in both time and frequency dimensions in multiple directions. Experimental studies on three real-world music data sets show that CRNN-TF consistently outperforms CRNN and several other state-of-the-art deep learning-based music classifiers. Our results also suggest CRNN-TF is transferable on small music data sets via the fine-tuning technique.


All content Copyright Austin Okray © 2026