Global-Local Item Embedding for Temporal Set Prediction

Seungjae Jung, Naver R&D Center and NAVER CLOVA, Korea, Republic of, seung.jae.jung@navercorp.com

Young-Jin Park, Naver R&D Center and NAVER CLOVA, Korea, Republic of, young.j.park@navercorp.com

Jisu Jeong, NAVER CLOVA and NAVER AI LAB, Korea, Republic of, jisu.jeong@navercorp.com

Kyung-Min Kim, NAVER CLOVA and NAVER AI LAB, Korea, Republic of, kyungmin.kim.ml@navercorp.com

Hiun Kim, NAVER CLOVA, Korea, Republic of, hiun.kim@navercorp.com

Minkyu Kim, NAVER CLOVA, Korea, Republic of, min.kyu.kim@navercorp.com

Hanock Kwak, NAVER CLOVA, Korea, Republic of, hanock.kwak2@navercorp.com

DOI: https://doi.org/10.1145/3460231.3478844
RecSys '21: Fifteenth ACM Conference on Recommender Systems, Amsterdam, Netherlands, September 2021

Temporal set prediction is becoming increasingly important as many companies employ recommender systems in their online businesses, e.g., personalized purchase prediction of shopping baskets. While most previous techniques have focused on leveraging a user's history, the study of combining it with others’ histories remains untapped potential. This paper proposes Global-Local Item Embedding (GLOIE) that learns to utilize the temporal properties of sets across whole users as well as within a user by coining the names as global and local information to distinguish the two temporal patterns. GLOIE uses Variational Autoencoder (VAE) and dynamic graph-based model to capture global and local information and then applies attention to integrate resulting item embeddings. Additionally, we propose to use Tweedie output for the decoder of VAE as it can easily model zero-inflated and long-tailed distribution, which is more suitable for several real-world data distributions than Gaussian or multinomial counterparts. When evaluated on three public benchmarks, our algorithm consistently outperforms previous state-of-the-art methods in most ranking metrics.

CCS Concepts: • Information systems → Information systems applications; • Information systems → Recommender systems;

Keywords: Temporal Sets, Set Prediction, Tweedie Distribution, Variational Autoencoder

ACM Reference Format:
Seungjae Jung, Young-Jin Park, Jisu Jeong, Kyung-Min Kim, Hiun Kim, Minkyu Kim, and Hanock Kwak. 2021. Global-Local Item Embedding for Temporal Set Prediction. In Fifteenth ACM Conference on Recommender Systems (RecSys '21), September 27-October 1, 2021, Amsterdam, Netherlands. ACM, New York, NY, USA, 6 pages. https://doi.org/10.1145/3460231.3478844

1 INTRODUCTION

Many recommendation tasks can be viewed as a problem of predicting the next set given the sequence of sets, e.g., predicting the next basket in online markets and the next playlist in streaming services. Previous works mainly focused on the local information (a given user's history), applying RNNs [2, 4, 20] or self-attention [21] to learn the temporal tendency of a sequence of sets within a user. From the recommender system point of view, temporal set prediction can have a sparsity problem as many users interact with only a small number of items. Throughout the recommender system literature, the sparsity problem has been dealt with low-rank approximation or collaborative filtering [11, 15, 16, 17]. However, such attempts have been less explored in temporal set prediction literature.

In this paper, we propose Global-Local Item Embedding (GLOIE) that integrates global and local information for temporal set prediction. To capture the global information, we utilize the Variational Autoencoders (VAEs) [10, 12] which are effective on noisy and sparse data. GLOIE then integrates the local embeddings, that are learned through dynamic graphs [21], with global embeddings by using an attention method. In addition, we enhance the performance of GLOIE by using Tweedie distribution for the likelihood of VAE instead of Gaussian or multinomial distributions. We show that purchase logs follow zero-inflated and long-tail distributions similar to the Tweedie distribution.

We conduct experiments on three public benchmarks, i.e. Dunnhumby Carbo, TaoBao, and Tags-Math-Sx. Empirical results demonstrate that the proposed method outperforms state-of-the-art methods on most metrics. The main contributions of the paper are summarized as follows:

Figure 1: Overview of GLOIE. We first make sum of time decayed vector x_i (left part of the figure). VAE maximizes the ELBO of each x_i. The weighted sum of $\hat{\mathbf {x}}_{ij} \cdot \mathbf {w}^*$ and z_ij becomes final embedding for interacted items. The weight α is determined by the attention mechanism. For the items that the user never interacted, we just use $\hat{\mathbf {x}}_{ij}$.

We propose GLOIE which differentiate global and local information and integrates them.
We claim that using Tweedie output for VAE decoder is beneficial as it naturally models two properties of data distributions from temporal set prediction problems: zero-inflated and long-tailed.
We achieve state-of-the-art performance on three public benchmarks.

2 PRELIMINARIES

2.1 Problem Definition

Let $\mathcal {U} = \lbrace u_1, u_2, \dots, u_N \rbrace$ be set of users and $\mathcal {E} = \lbrace e_1, e_2, \dots, e_M \rbrace$ be set of items. Given a user u_i’s sequence of sets Math 5 , our goal is to predict next set $S_i^{T_i + 1}$. Each set $S_i^k$ can also be represented as a binary vector form $\mathbf {v}_i^k$, where each element Math 9 . Math 10 is a indicator function which returns 1 if x is true and 0 otherwise. We will use the set notation $S_i^k$ and $\mathbf {v}_i^k$ interchangeably to denote user u_i’s k-th set.

2.2 Variational Autoencoders

Variational Autoencoders (VAEs) [10, 13] are a class of deep generative models. VAEs provide latent structures that can nicely explain the observed data (e.g., customer's purchase history). Formally speaking, VAEs find the latent variables (z) that maximize the evidence lower bound (ELBO), a surrogate objective function for the maximum likelihood estimation of the given observation x:

(1)

where θ and ϕ are model parameters of a decoder and an encoder, respectively.

Encoders are commonly structured by multilayer perceptrons (MLPs) that produce Gaussian distributions over the latent variable z. On the other hand, decoders are designed to have different probability distributions of output layer depending on the characteristics of datasets. For example, Gaussian distribution and multinomial distribution are often used to represent the real-valued continuous and binary data, respectively.

3 METHOD

3.1 Learning Global-Local Information by VAE

We are interested in modeling other users’ history as well as the given user's history. We proceed with this under the VAE framework. However, a difficulty arises: every user has a different length of the sequence of sets. To resolve this, we sum time decayed sequence of sets.

\begin{equation} \mathbf {x}_i = \sum _{1 \le k \le T_i} \mathbf {v}_i^k \cdot \tau ^{T_i - k} \end{equation}
(2)

where $\mathbf {v}_i^k$ is a vector representation of user u_i’s kth set $S_i^k$, τ ∈ (0, 1) is a decay factor and T_i is the length of the user u_i’s sequence as explained in Section 2.1. Note that k in $\mathbf {v}_i^k$ represents the index and T_i − k in $\tau ^{T_i - k}$ represents the power. We then update our model to maximize the ELBO of each x_i:

(3)

where $\mathcal {D}_u$ is a set of sum of time decayed sequence of sets x_i. Equation (3) is merely an expectation of negative of Equation (1) over $\mathcal {D}_u$. Equation (3) can be understood as the low-rank approximation with the constraint D_KL(q(z∣x_i)||p(z)).

As most people interact with a few items, v_i contains many 0s and so does x_i. However, the reconstructed vector $\hat{\mathbf {x}}_i$ generated from the following process:

\begin{equation} \mathbf {z}_i \sim q(z \mid \mathbf {x}_i), \quad \hat{\mathbf {x}}_i \sim p(\mathbf {x}_i \mid \mathbf {z}_i) \end{equation}
(4)

is a dense vector and contains the information of expected preference of user u_i to item e_j in $\hat{\mathbf {x}}_{ij}$ even though the user u_i never interacted with item e_j. Consider two users $u_{i_1}$ and $u_{i_2}$ with similar histories. $\mathbf {z}_{i_1}$ and $\mathbf {z}_{i_2}$ should be close as $\mathbf {x}_{i_1}$ and $\mathbf {x}_{i_2}$ are close. In turn, the distance between two reconstructed vectors $\hat{\mathbf {x}}_{i_1}$ and $\hat{\mathbf {x}}_{i_2}$ is small. Hence, even if user $u_{i_1}$ never interacted with item e_j, $\hat{\mathbf {x}}_{i_1 j}$ should be high if $\mathbf {x}_{i_2 j}$ is high.

Note that Hu et al. [5]’s Personalized Item Frequency (PIF) is similar to our sum of time decayed vectors. However, since their work is based on K-Nearest Neighbor, two shortcomings arise: 1) the inference time grows cubic with the number of users and 2) they cannot use features unlike traditional deep learning approaches.

3.2 Tweedie Output on Decoder

Figure 2: Upper row: Histogram plot of samples from Tweedie distribution. Lower row: Histogram plot of elements of sum of decayed vector x_i. Across all benchmarks, the distributions are zero-inflated and long-tailed.

When training VAE, using gaussian output on p(x∣z) is a straightforward option. However, the distributions of the data generated from temporal set prediction problems are zero-inflated and long-tailed as shown in Figure 2.

Tweedie distribution is a special case of exponential dispersion model (EDM) with a power parameter p and the variance function V(μ) = μ^p [6]. Tweedie distribution with 1 < p < 2 corresponds to a class of compound Poisson distributions [7]. Consider two step sampling process N ∼ Poisson(λ) and X_i ∼ Gamma(α, β) for λ, α, β > 0 and i = 1, …, N. Now we define a random variable Z as follows:

\begin{equation} Z = {\left\lbrace \begin{array}{@{}l@{\quad }l@{}}0 & N = 0 \\ X_1 + \dots + X_N & N > 0 \end{array}\right.}. \end{equation}
(5)

It is straightforward that the distribution defined by Equation (5) is zero-inflated for small enough λ and long-tailed as Z is addition of X_i ∼ Gamma(α, β) for N > 0. Hence using Tweedie output on VAE's decoder for temporal set prediction is beneficial as Tweedie distribution easily captures the properties of distributions that are shown in Figure 2.

Learning the mean parameter μ and the power parameter p of Tweedie distribution via maximum likelihood is easy. Minimizing

\begin{equation} \mathcal {L}_{Tweedie}(z, \mu, p) = - z \cdot \frac{\mu ^{1 - p}}{1 - p} + \frac{\mu ^{2 - p}}{2 - p} \end{equation}
(6)

maximizes log-likelihood. Here z is the target. See Yang et al. [19] for details.

A line of works chose distributions other than Gaussian or Bernoulli on matrix factorization and VAE [3, 12, 18]. Poisson distribution or multinomial distribution are usual choices. This paper is the first attempt to apply Tweedie distribution for VAE's decoder output to the best of our knowledge.

3.3 Integrating Global-Local Information

Though VAE with Tweedie output is already competent, it tends to underestimate a user's preference for the frequently interacted items. This tendency owes to the learning objective of VAE as the model has to maximize the likelihood of 0 for never interacted items. This sometimes sacrifices the ability to maximize the likelihood of values of interacted items. Hence, we integrate item embeddings of frequently interacted items which are learned by state-of-the-art models to our VAE. We use DNNTSP [21] as it is the state-of-the-art method. DNNTSP makes user-dependent embeddings of items that are interacted at least once via dynamic graph neural networks. We denote z_ij as the embedding of user u_i for item e_j.

When it comes to combining VAE with embeddings z_ij, a problem arises: the reconstructed value $\hat{\mathbf {x}}_{ij}$ of Equation (4) is a scalar while learned embedding z_ij is a vector. Hence, to match the size between $\hat{\mathbf {x}}_{ij}$ and z_ij, we simply multiply a vector w^*, which is of same size as z_ij, to $\hat{\mathbf {x}}_{ij}$. We defer the discussion on the selection of w^* to latter part of this section.

Now we combine updated embedding with $\hat{\mathbf {x}}_{ij} \cdot \mathbf {w}^*$s and z_ijs. The updated embedding $\tilde{\mathbf {z}}_{ij}$ is defined as

\begin{equation} \tilde{\mathbf {z}}_{ij} = {\left\lbrace \begin{array}{@{}l@{\quad }l@{}}Att(\tilde{\mathbf {x}}_{ij} \cdot \mathbf {w}^*, \mathbf {z}_{ij}) & e_j \in \bigcup _k S_i^k \\ \hat{\mathbf {x}}_{ij} \cdot \mathbf {w}^* & otherwise \end{array}\right.} \end{equation}
(7)

where $\tilde{\mathbf {x}}_{ij} = \frac{\hat{\mathbf {x}}_{ij}}{\max _j \hat{\mathbf {x}}_{ij}} - 0.5$ which normalizes all values of $\hat{\mathbf {x}}_i$ to [0.5, −0.5] and Att(q, k) is defined as follows:

\begin{equation} Att(\mathbf {q}, \mathbf {k}) = \alpha \cdot \mathbf {q} + (1 - \alpha) \cdot \mathbf {k}, \, \text{where} \, \alpha = \sigma \left((\mathbf {W}_q \cdot \mathbf {q})^T (\mathbf {W}_k \cdot \mathbf {k}) \right). \end{equation}
(8)

Lastly, we calculate the affinity of a user u_i to item e_j by

\begin{equation} \hat{\mathbf {y}}_{ij} = \sigma \left(\tilde{\mathbf {z}}_{ij}^T \mathbf {w}_0 + b_0 \right) \end{equation}
(9)

where $\sigma (x) = \frac{1}{1 + e^{-x}}$. We choose multi-label soft margin loss for training:

\begin{equation} \mathcal {L}_{TSP} = - \frac{1}{N} \sum _i^N \sum _j^M \mathbf {y}_{ij} \log \hat{\mathbf {y}}_{ij} + (1 - \mathbf {y}_{ij}) \log (1 - \hat{\mathbf {y}}_{ij}). \end{equation}
(10)

We now going back to the selection of w^* in Equation (7). We set w^* = w₀ in Equation (9) though we can set w^* as a learnable parameter. If we set w^* = w₀, Equation (9) becomes $ \hat{\mathbf {y}}_{ij} = \sigma \left(\hat{\mathbf {x}}_{ij} \cdot \mathbf {w}_0^T \mathbf {w}_0 + b_0 \right)$ for non-interacted item $e_j \notin \bigcup _k S_i^k$, hence it preserves the order of affinity that is learned by VAE. We also run experiments with learnable parameter w^* but the performance was on par with w^* = w₀.

4 EXPERIMENTS

4.1 Benchmarks

Table 1: Statistics of public benchmarks.

Dataset	# of users	# of sets	# of elements	#E/S	#S/U	#E/U
DC	9,010	42,905	217	1.52	4.76	5.44
TaoBao	113,347	628,618	689	1.10	5.55	4.96
TMS	15,726	243,394	1,565	2.19	15.48	18.05

We evaluate our method on three public benchmarks: Dunnhumby Carbo (DC), TaoBao, and Tags-Math-Sx (TMS). We partitioned each dataset into train, validation and test into 70%, 10% and 20% respectively following Yu et al. [21]. See Table 1 for statistics of benchmarks.

4.2 Compared Methods

We compare four methods: Toppop, PersonalToppop, Sets2Sets and DNNTSP.

Toppop simply serves the items that are interacted the most across all users. PersonalToppop serves the items that the given user interacted with the most. Sets2Sets uses encoder-decoder framework to predict the next set [4]. Set embeddings are made by pooling operation and set-based attention is used to model temporal correlation relation. This method also models repeated elements. DNNTSP is composed of three components: Element Relationship Learning (ERL), Temporal Dependency Learning (TDL) and Gated Information Fusing. ERL is simply a dynamic weighted graph neural networks. TDL captures temporal dependency. By Gated Information Fusing each user shares the embeddings of uninteracted items. Hence DNNTSP can be seen as an ensemble of model which learns local information and Toppop model.

4.3 Results & Analyses

The results of our evaluation on three public benchmarks are shown in Table 2. We consider three metrics: Recall, Normalized Discounted Cumulative Gain (NDCG) and Personal Hit Ratio (PHR). PHR@K is calculated as Math 52 where N′ is the number of test users, $\hat{S}_i$ is the predicted top-K elements, and S_i is the ground truth set.

We trained VAE for 30 epochs and then trained DNNTSP for 30 epochs. We used only one layer for the VAEs across all benchmarks. For the decay factor in Equation (2), we set τ = 0.6. As illustrated in Figure 3, τ = 0.6 shows the best performance on TaoBao dataset. We empirically observe that τ = 0.6 could provide decent results across all metrics on the other datasets as well. The dimension of latent space is 128 for DC and TaoBao, and 512 for TMS. We used Adam optimizer [9] with learning rate 0.001.

Across all benchmarks and metrics, GLOIE with Tweedie output outperforms or is on par with all compared methods. Especially, GLOIE with Tweedie output outperforms the other methods on all metrics on DC and TaoBao datasets. We can see that GLOIE with Tweedie outperforms DNNTSP on every metric when K = 10 which means that the embeddings learned by VAE are richly used as well as the embeddings learned by DNNTSP.

One thing to remark is that VAE with Tweedie output shows comparable performance to all compared methods on most metrics. Given that the number of items a user interacted with is 5.44, 4.96, and 18.05 respectively in DC, TaoBao, and TMS, this shows that VAE with Tweedie output captures the preference of users to non-interacted items.

To investigate the effectiveness of the attention-based integration method illustrated in Equation (8), we compared GLOIE with attention to the one with itemwise learnable weight similar to the method proposed in Yu et al. [21]. For overall datasets and metrics, we could observe performance gains: 3.09%, 0.45%, 0.92% improvement of NDCG@10 on DC, Taobao, and TMS, respectively.

Lastly, we note that the selection of output distribution of decoder on VAE largely affects the performance. We empirically show that both Gaussian and multinomial outputs do not fit temporal set prediction problems even though Gaussian is a popular choice for VAE and multinomial is a common choice in recommender system literature after the advent of VAECF [12].

Table 2: Comparison between various state-of-the-art methods and ours on three public benchmarks. All highest scores are in bold and all second best scores are underlined.

Dataset	Model	k = 10			k = 20			k = 40
		Recall	NDCG	PHR	Recall	NDCG	PHR	Recall	NDCG	PHR
DC	Toppop	0.1618	0.0880	0.2274	0.2475	0.1116	0.3289	0.3940	0.1448	0.4997
	PersonalToppop	0.4104	0.3174	0.5031	0.4293	0.3270	0.5258	0.4747	0.3332	0.5785
	Sets2Sets	0.4488	0.3136	0.5458	0.5143	0.3319	0.6162	0.6017	0.3516	0.7005
	DNNTSP	0.4564	0.3165	0.5557	0.5294	0.3369	0.6272	0.6180	0.3568	0.7165
	VAE - Gaussian	0.1618	0.0882	0.2274	0.2507	0.1128	0.3333	0.3847	0.1430	0.4903
	VAE - Multinomial	0.1602	0.0850	0.2230	0.2492	0.1097	0.3311	0.3767	0.1387	0.4786
	VAE - Tweedie	0.4166	0.3000	0.5108	0.5122	0.3267	0.6062	0.6217	0.3517	0.7088
	GLOIE - Gaussian	0.3108	0.2349	0.3971	0.3738	0.2526	0.4664	0.4545	0.2706	0.5563
	GLOIE - Multinomial	0.3265	0.2465	0.4143	0.3870	0.2633	0.4798	0.4615	0.2803	0.5602
	GLOIE - Tweedie	0.4658	0.3264	0.5613	0.5415	0.3477	0.6351	0.6428	0.3708	0.7288
TaoBao	Toppop	0.1567	0.0784	0.1613	0.2494	0.1019	0.2545	0.3679	0.1264	0.3745
	PersonalToppop	0.2190	0.1535	0.2230	0.2260	0.1554	0.2306	0.2433	0.1590	0.2484
	Sets2Sets	0.2811	0.1495	0.2868	0.3649	0.1710	0.3713	0.4672	0.1922	0.4739
	DNNTSP	0.3035	0.1841	0.3095	0.3811	0.2039	0.3873	0.4776	0.2238	0.4843
	VAE - Gaussian	0.1592	0.0750	0.1635	0.2480	0.0974	0.2530	0.3665	0.1219	0.3727
	VAE - Multinomial	0.1588	0.0798	0.1634	0.2494	0.1027	0.2545	0.3660	0.1268	0.3723
	VAE - Tweedie	0.2954	0.1939	0.3006	0.3775	0.2148	0.3827	0.4768	0.2353	0.4822
	GLOIE - Gaussian	0.2982	0.1768	0.3044	0.3790	0.1973	0.3851	0.4769	0.2175	0.4835
	GLOIE - Multinomial	0.2980	0.1791	0.3040	0.3783	0.1995	0.3846	0.4750	0.2195	0.4819
	GLOIE - Tweedie	0.3099	0.2007	0.3152	0.3917	0.2216	0.3972	0.4868	0.2412	0.4924
TMS	Toppop	0.2627	0.1627	0.4619	0.3902	0.2017	0.6243	0.5605	0.2448	0.8007
	PersonalToppop	0.4508	0.3464	0.6440	0.5274	0.3721	0.7146	0.5495	0.3771	0.7374
	Sets2Sets	0.4748	0.3782	0.6933	0.5601	0.4061	0.7594	0.6627	0.4321	0.8570
	DNNTSP	0.4693	0.3473	0.6825	0.5826	0.3839	0.7880	0.6840	0.4097	0.8748
	VAE - Gaussian	0.2731	0.1919	0.4660	0.3913	0.2288	0.6195	0.5496	0.2688	0.7813
	VAE - Multinomial	0.2548	0.1615	0.4431	0.3830	0.2001	0.6020	0.5437	0.2412	0.7740
	VAE - Tweedie	0.4661	0.3744	0.6548	0.5579	0.4040	0.7432	0.6663	0.4316	0.8341
	GLOIE - Gaussian	0.1345	0.0833	0.2486	0.2363	0.1155	0.4018	0.4014	0.1570	0.6033
	GLOIE - Multinomial	0.1479	0.1029	0.2797	0.2192	0.1252	0.3872	0.3259	0.1524	0.5362
	GLOIE - Tweedie	0.4860	0.3823	0.6863	0.5868	0.4144	0.7753	0.6926	0.4418	0.8538

Figure 3: Metric@10 by varying VAE decay factor (τ) on TaoBao dataset.

5 CONCLUSION

This paper proposes Global-Local Item Embedding (GLOIE) that learns to utilize the temporal properties of sets across whole users as well as within a user. The proposed model learns global-local information by maximizing ELBO of the sum of time decayed vectors under the VAE framework and integrates local embeddings learned by dynamic graph neural networks. As users with similar histories are reconstructed to close vectors, we could model the preference of the given user for a non-interacted item if other users with similar histories frequently interacted with the item. Data analysis and empirical results show that using Tweedie output for VAE's decoder is effective for modeling temporal set prediction. The proposed method achieves state-of-the-art results by considering global information which is less explored in temporal set prediction literature.

Though the proposed VAE is powerful in itself, there is some room for improvement. Instead of using the sum of time decayed vector, we can model change of sets in continuous time by Neural ODE [1, 14]. Finding an appropriate form of prior that captures long-tailed distribution for the temporal set prediction can also be a future direction. Graph modality can also be used [8]. Last but not least, searching for better ways to integrating embeddings learned by VAE and embeddings learned by other algorithms which focus on local information is also an important future work.

REFERENCES

Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. 2018. Neural Ordinary Differential Equations. In Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), Vol. 31. Curran Associates, Inc.https://proceedings.neurips.cc/paper/2018/file/69386f6bb1dfed68692a24c8686939b9-Paper.pdf
Edward Choi, Mohammad Taha Bahadori, Andy Schuetz, Walter F Stewart, and Jimeng Sun. 2016. Doctor ai: Predicting clinical events via recurrent neural networks. In Machine learning for healthcare conference. PMLR, 301–318.
Prem Gopalan, Jake M. Hofman, and David M. Blei. 2015. Scalable Recommendation with Hierarchical Poisson Factorization. In Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence (Amsterdam, Netherlands) (UAI’15). AUAI Press, Arlington, Virginia, USA, 326–335.
Haoji Hu and Xiangnan He. 2019. Sets2sets: Learning from sequential sets with neural networks. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1491–1499.
Haoji Hu, Xiangnan He, Jinyang Gao, and Zhi-Li Zhang. 2020. Modeling personalized item frequency information for next-basket recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 1071–1080.
Bent Jørgensen. 1987. Exponential Dispersion Models. Journal of the Royal Statistical Society. Series B (Methodological) 49, 2(1987), 127–162. http://www.jstor.org/stable/2345415
Bent Jørgensen. 1997. The Theory of Dispersion Models. Chapman and Hall/CRC.
Kyung-Min Kim, Dong-Hyun Kwak, Hanock Kwak, Young-Jin Park, Sangkwon Sim, Jae-Han Cho, Minkyu Kim, Jihun Kwon, Nako Sung, and Jung-Woo Ha. 2019. Tripartite Heterogeneous Graph Propagation for Large-scale Social Recommendation. In Proceedings of ACM RecSys 2019 Late-Breaking Results, Marko Tkalcic and Sole Pera (Eds.). http://ceur-ws.org/Vol-2431/paper12.pdf
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In International Conference on Learning Representations, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1412.6980
Diederik P. Kingma and Max Welling. 2014. Auto-Encoding Variational Bayes. In International Conference on Learning Representations, Yoshua Bengio and Yann LeCun (Eds.). https://openreview.net/forum?id=33X9fd2-9FyZd
Yehuda Koren. 2008. Factorization Meets the Neighborhood: A Multifaceted Collaborative Filtering Model. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Las Vegas, Nevada, USA) (KDD ’08). Association for Computing Machinery, New York, NY, USA, 426–434. https://doi.org/10.1145/1401890.1401944
Dawen Liang, Rahul G Krishnan, Matthew D Hoffman, and Tony Jebara. 2018. Variational autoencoders for collaborative filtering. In Proceedings of the 2018 world wide web conference. 689–698.
Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. 2014. Stochastic backpropagation and approximate inference in deep generative models. In International conference on machine learning. PMLR, 1278–1286.
Yulia Rubanova, Ricky T. Q. Chen, and David K Duvenaud. 2019. Latent Ordinary Differential Equations for Irregularly-Sampled Time Series. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran Associates, Inc.https://proceedings.neurips.cc/paper/2019/file/42a6845a557bef704ad8ac9cb4461d43-Paper.pdf
Ruslan Salakhutdinov and Andriy Mnih. 2007. Probabilistic Matrix Factorization. In Proceedings of the 20th International Conference on Neural Information Processing Systems (Vancouver, British Columbia, Canada) (NIPS’07). Curran Associates Inc., Red Hook, NY, USA, 1257–1264.
Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-Based Collaborative Filtering Recommendation Algorithms. In Proceedings of the 10th International Conference on World Wide Web (Hong Kong, Hong Kong) (WWW ’01). Association for Computing Machinery, New York, NY, USA, 285–295. https://doi.org/10.1145/371920.372071
Nathan Srebro, Jason Rennie, and Tommi Jaakkola. 2005. Maximum-Margin Matrix Factorization. In Advances in Neural Information Processing Systems, L. Saul, Y. Weiss, and L. Bottou (Eds.), Vol. 17. MIT Press. https://proceedings.neurips.cc/paper/2004/file/e0688d13958a19e087e123148555e4b4-Paper.pdf
Quoc-Tuan Truong, Aghiles Salah, and Hady W. Lauw. 2021. Bilateral Variational Autoencoder for Collaborative Filtering. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining (Virtual Event, Israel) (WSDM ’21). Association for Computing Machinery, New York, NY, USA, 292–300. https://doi.org/10.1145/3437963.3441759
Yi Yang, Wei Qian, and Hui Zou. 2016. Insurance Premium Prediction via Gradient Tree-Boosted Tweedie Compound Poisson Models. arxiv:1508.06378 [stat.ME]
Feng Yu, Qiang Liu, Shu Wu, Liang Wang, and Tieniu Tan. 2016. A dynamic recurrent model for next basket recommendation. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. 729–732.
Le Yu, Leilei Sun, Bowen Du, Chuanren Liu, Hui Xiong, and Weifeng Lv. 2020. Predicting Temporal Sets with Deep Neural Networks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1083–1091.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).

RecSys '21, September 27–October 01, 2021, Amsterdam, Netherlands