# LIBMF Rust [LIBMF](https://github.com/cjlin1/libmf) - large-scale sparse matrix factorization - for Rust Check out [Disco](https://github.com/ankane/disco-rust) for higher-level collaborative filtering [![Build Status](https://github.com/ankane/libmf-rust/actions/workflows/build.yml/badge.svg)](https://github.com/ankane/libmf-rust/actions) ## Installation Add this line to your application’s `Cargo.toml` under `[dependencies]`: ```toml libmf = "0.3" ``` ## Getting Started Prep your data in the format `row_index, column_index, value` ```rust let mut data = libmf::Matrix::new(); data.push(0, 0, 5.0); data.push(0, 2, 3.5); data.push(1, 1, 4.0); ``` Fit a model ```rust let model = libmf::Model::params().fit(&data).unwrap(); ``` Make predictions ```rust model.predict(row_index, column_index); ``` Get the latent factors (these approximate the training matrix) ```rust model.p(row_index); model.q(column_index); // or model.p_iter(); model.q_iter(); ``` Get the bias (average of all elements in the training matrix) ```rust model.bias(); ``` Save the model to a file ```rust model.save("model.txt").unwrap(); ``` Load a model from a file ```rust let model = libmf::Model::load("model.txt").unwrap(); ``` Pass a validation set ```rust let model = libmf::Model::params().fit_eval(&train_set, &eval_set).unwrap(); ``` ## Cross-Validation Perform cross-validation ```rust let avg_error = libmf::Model::params().cv(&data, 5).unwrap(); ``` ## Parameters Set parameters - default values below ```rust libmf::Model::params() .loss(libmf::Loss::RealL2) // loss function .factors(8) // number of latent factors .threads(12) // number of threads .bins(25) // number of bins .iterations(20) // number of iterations .lambda_p1(0.0) // L1-regularization parameter for P .lambda_p2(0.1) // L2-regularization parameter for P .lambda_q1(0.0) // L1-regularization parameter for Q .lambda_q2(0.1) // L2-regularization parameter for Q .learning_rate(0.1) // learning rate .alpha(1.0) // importance of negative entries .c(0.0001) // desired value of negative entries .nmf(false) // perform non-negative MF (NMF) .quiet(false); // no outputs to stdout ``` ### Loss Functions For real-valued matrix factorization - `Loss::RealL2` - squared error (L2-norm) - `Loss::RealL1` - absolute error (L1-norm) - `Loss::RealKL` - generalized KL-divergence For binary matrix factorization - `Loss::BinaryLog` - logarithmic error - `Loss::BinaryL2` - squared hinge loss - `Loss::BinaryL1` - hinge loss For one-class matrix factorization - `Loss::OneClassRow` - row-oriented pair-wise logarithmic loss - `Loss::OneClassCol` - column-oriented pair-wise logarithmic loss - `Loss::OneClassL2` - squared error (L2-norm) ## Metrics Calculate RMSE (for real-valued MF) ```rust model.rmse(&data); ``` Calculate MAE (for real-valued MF) ```rust model.mae(&data); ``` Calculate generalized KL-divergence (for non-negative real-valued MF) ```rust model.gkl(&data); ``` Calculate logarithmic loss (for binary MF) ```rust model.logloss(&data); ``` Calculate accuracy (for binary MF) ```rust model.accuracy(&data); ``` Calculate MPR (for one-class MF) ```rust model.mpr(&data, transpose); ``` Calculate AUC (for one-class MF) ```rust model.auc(&data, transpose); ``` ## Example Download the [MovieLens 100K dataset](https://grouplens.org/datasets/movielens/100k/). Add these lines to your application’s `Cargo.toml` under `[dependencies]`: ```toml csv = "1" serde = { version = "1", features = ["derive"] } ``` And use: ```rust use csv::ReaderBuilder; use serde::Deserialize; use std::fs::File; #[derive(Debug, Deserialize)] struct Row { user_id: i32, item_id: i32, rating: f32, time: i32, } fn main() { let mut train_set = libmf::Matrix::new(); let mut valid_set = libmf::Matrix::new(); let file = File::open("u.data").unwrap(); let mut rdr = ReaderBuilder::new() .has_headers(false) .delimiter(b'\t') .from_reader(file); for (i, record) in rdr.records().enumerate() { let row: Row = record.unwrap().deserialize(None).unwrap(); let matrix = if i < 80000 { &mut train_set } else { &mut valid_set }; matrix.push(row.user_id, row.item_id, row.rating); } let model = libmf::Model::params().fit_eval(&train_set, &valid_set).unwrap(); println!("RMSE: {:?}", model.rmse(&valid_set)); } ``` ## Reference Specify the initial capacity for a matrix ```rust let mut data = libmf::Matrix::with_capacity(3); ``` ## Resources - [LIBMF: A Library for Parallel Matrix Factorization in Shared-memory Systems](https://www.csie.ntu.edu.tw/~cjlin/papers/libmf/libmf_open_source.pdf) ## History View the [changelog](https://github.com/ankane/libmf-rust/blob/master/CHANGELOG.md) ## Contributing Everyone is encouraged to help improve this project. Here are a few ways you can help: - [Report bugs](https://github.com/ankane/libmf-rust/issues) - Fix bugs and [submit pull requests](https://github.com/ankane/libmf-rust/pulls) - Write, clarify, or fix documentation - Suggest or add new features To get started with development: ```sh git clone --recursive https://github.com/ankane/libmf-rust.git cd libmf-rust cargo test ```