Redemption Prediction Using Deep Learning

Nailong Zhang, Peng Wang, Wenjing Lu and Ashish Agrawal
Friday 07 September 2018

Introduction

Fund managers loose a singificant amount of money in assets under management per year from shareholder(for example, investment advisor) redemption activity. If we could predict the risk (for example, probability of redeeming more than a certain amount of securities), proactive interference through different types of touchpoints (for example, email contact) may be conducted to reduce the redemption rates. Asset redemptions are observed in different channels (for example, Retail channel, DCIO channel, and RIA channel). In this project, we focus on the RIA channel. The objective of this study is to build up a deep learning model to predict the near-future redemption risk for each advisor considering the entire portfolio of the advisor. If we can identify the advisors with high redemption risk, appropriate intervention activities may reduce the redemptions.

Data

There are two main data sources from OFI for this project. One is the monthly updated investment advisor indicative table (advisor table), and the other is the monthly updated assets and flows table (flow table). The advisor table contains the information for advisors and the flow table contains the assets/sales/redemptions for each advisor in each month. By joining these two tables we could track the assets/sales/redemptions history for each advisor. One advisor's portfolio may include one or more funds (or composite funds). We use the past 12 months' transaction records of each financial advisor as the independent variables in our model. There are 309 different funds in the dataset. In each month for each fund, we have the sales amount and the redemption amount, as well as the assets in the beginning of the month. Thus, the total number of variables for each advisor is 12(months) * 309(funds) * 3(assets, sales and redemptions) = 11124. For such high-dimensional, sparse and structured data, deep neural networks may fit well. The binary response variable is created by comparing the next 3-month total redemption with $30,000. More specifically, if the next 3-month redemption for sample is larger than $30,000. In total, we created 11584 samples from the data. These samples are further split into training dataset and testing dataset with a ratio 4:1. It is worth noting that multiple samples may belong to the same advisor with different timestamps. Thus, to avoid potential overfitting, we split the these samples for training and testing so that there is no overlapping of advisors in training data and testing data.

Models

Multilayer Perceptron (MLP) We created the features for MLP by flattening and concatenating the original transaction records, as illustrated in Fig. 1.

image alt <>

The MLP neural network is created in Keras using Tensorflow as backend. The model is trained on an AWS EC2 instance with GPU.



Click here to see MLP model structure

<>




Convolutional Neural Network (CNN)

We created the features for CNN as illustrated in Fig. 2.

image alt <>



Click here to see CNN model structure

<>




LSTM

  • We reshaped the data from 2D to 3D. Shape = (rows,309,36). So 12 months of asset,sales and redemption data in one row for a single Advisor. Then we stacked all 309 funds for that advisor together. This was done for all advisors.
  • Stateful=True gave better performance and than stateful = False. Strangely stateful = False for certain epochs the test AUCs tanked from 0.78 to 0.19. This did not happen with stateful=True with the parameters I tested with.
  • Increasing the internal states did not give us significant improvements in AUC. The training got slower with increase in internal states.
  • We scaled the values in the matrix using minMax scaler as LSTMs sensitive to scales.



Click here to see LSTM model structure

<>




Performance

Table 1. Prediction AUC of Different Models On Testing Samples

gradient boosting (xgboost) MLP CNN Ensemble (3 models) LSTM
0.9294 0.9243 0.9295 0.9384 0.832

From Table 1, gradient boosting, MLP and CNN give very close result, among which CNN performs slightly better. While, a simple equal-weight average of these 3 models could boost the AUC by almost 1%. Discussion Training the MLP and CNN models on a P2 AWS instance is fast with the help of GPU. The best number of epochs were not selected by cross-validation, instead we manually selected the number of epochs to run. For gradient boosting models, the performance is usually sensitive to the number of iterations because of overfitting. While when we increase the number of training epochs, we don't see a significant performance drop for MLP and CNN, thanks to the dropout layer. We also notice that the activation functions are very important. It is worth trying different activations for the same layer. Regarding the number of hidden layers, when we use more layers no improvement is seen. One issue we have not resolved is the reproducibility of the result, although we followed the instruction.