Recreating the GCN Model on Citibike Dataset
A picture of a stone wall somewhere in Taichung. Shot with Pentax KX and Agfa APX 400.
Just got back from half a year of exchange program in Germany, I got a few months before the compulsory military service. To keep myself busy, I joined the AI Research Center at Feng Chia university working as a research assistant. My first task here was to implement the model from Chai et. al. [1] on predicting rental bike flow on the NYC citibike dataset. The model utilizes Graph Convolution Networks to embedd spatial relations between the rental stations and process the time sequences using LSTM.
Up until this point, I've really only had experience implementing simple models in Tensorflow. This is a perfect oppurtunity to test my understanding of machine learning to implement a working model entirely from scratch. With the additional benefit of working with a new library -- PyTorch.
Should be fun!
1. Multi-Graph Convolution
The NYC Citibike dataset is a growing dataset of ride records from the bike rental service. We can obtain a list of unique stations and their latitude/longitude from the records. Then, by analizing the data, the authors came up with three graphs that captures different aspect of the stations' interactions: Inverse of Distance, Average Ride per Day, Inbound-Outbound Correlation. In these graphs, the vertices are the stations, and the edge are the value of these metrics.
An important contribution of this paper is the multi-graph convolutional layer. Which is realized though first combining the multiple graphs with graph fusion , then performing graph convolution on it. The graphs are combined by a weighted sum of the adjacency matrix at the element level.
The weighted sum can be expressed as:
Where D is:
With the fusioned graph, we can perform graph spectral convolution on the input data, which is chosen to be the concatnation of each station's inflow and outflow (, respectively) data in a set interval of time (it is chosen to be one hour in the paper). More formally:
2. The Model
The task is to predict the in and out flow for each station, given a window of history flow data and the weather. This sequencial data natrually calls for recurrent neural networks such as Long short-term memory (LTSM). Each "element" in the sequence are the in and outflow data embedded with the convolution process (recall the from last section.)
In this paper, the authors are basically using a encoder-decoder structure The encoder-decoder are firse pretrained to effectively encode the history ride flow data in a window. Then, a fully connected layer is trained to predict the bikeflow of a timestamp from the previous history and the weather data.
The model structure during pretrain and training.
3. Conclusion & Takeaway
Now that I come back to revisit the code I wrote, there's definitly some things that I would do differently this time. For example, the crude way of loading all ~60 GB data directly into memory. However, it's not a terrible project for a first timer. The code and some additional information can be found on my Github repository
Obviously this is a gross over-simplification of the original work. If you are still interested, please go and give the original paper a read. It's very well written and clear.
Well that's all from me today. Till next time!
4. References
All images I use in this post were from the original paper:[1] Di Chai, Leye Wang, Qiang Yang, Bike Flow Prediction with Multi-Graph Convolutional Networks, 2018