mf-fitter

Author	SHA1	Message	Date
Dylan Knutson	4651b96785	write to temp table and atomic swap with old table	2024-12-28 21:02:44 +00:00
Dylan Knutson	b255f40ac7	add progress bar	2024-12-28 20:51:11 +00:00
Dylan Knutson	b3ba58723c	Enhance fit model functionality and argument handling - Updated `fit_model_args.rs` to allow optional factors for matrix factorization and added an index name argument for index management. - Modified `fit_model.rs` to handle index creation and dropping during data upsert, improving database interaction. - Adjusted schema validation to infer vector dimensions and validate against specified factors. - Enhanced `generate_test_data.rs` to create an IVFFlat index on the embeddings column. These changes improve the flexibility and robustness of the fit model process, allowing for better management of database indices and more intuitive argument handling.	2024-12-28 20:37:12 +00:00
Dylan Knutson	5430fdd501	remove float[] array support, only use vector	2024-12-28 19:58:46 +00:00
Dylan Knutson	75e7a4538d	Refactor Dockerfile and add fit_model functionality - Updated the Dockerfile to rename the built binary from `mf-fitter` to `fit_model` for clarity. - Introduced a new `fit_model_args.rs` file to define command-line arguments for the fit model process, including parameters for matrix factorization. - Added `pg_types.rs` and `pgvector.rs` files to handle PostgreSQL type interactions and vector serialization/deserialization. - Implemented the main logic for the fit model in `fit_model.rs`, including data loading, model training, and embedding saving. - Enhanced `visualize_embeddings.rs` to load embeddings and clusters more efficiently. These changes improve the organization and functionality of the model fitting process, making it more intuitive and maintainable.	2024-12-28 19:50:24 +00:00
Dylan Knutson	bc88c54cb0	write vector binary type to database	2024-12-28 19:04:43 +00:00
Dylan Knutson	2b1865f3d4	use COPY for exporting data into temp table	2024-12-28 18:32:18 +00:00
Dylan Knutson	c4e79a36f9	Add argument parsing for data loading configuration - Introduced a new `args.rs` file to define command-line arguments for data loading parameters, including source and target table details, matrix factorization settings, and optional interaction limits. - Refactored `main.rs` to utilize the new argument structure, enhancing code organization and readability. - Removed the previous inline argument definitions, streamlining the main application logic. These changes improve the configurability and maintainability of the data loading process.	2024-12-28 18:16:39 +00:00
Dylan Knutson	428ca89c92	use COPY for importing data	2024-12-28 17:55:56 +00:00
Dylan Knutson	857cbf5d1f	add max interactions flag	2024-12-28 17:41:42 +00:00
Dylan Knutson	350c61c313	Refactor data loading and embedding saving process - Updated `.cargo/config.toml` to optimize compilation flags for performance. - Enhanced `main.rs` by: - Renaming user and item ID columns for clarity. - Adding validation functions to ensure the existence of tables and columns in the database schema. - Implementing immediate exit handling during data loading. - Modifying the `save_embeddings` function to accept item IDs for processing. - Improving error handling with context messages for database operations. These changes improve code readability, robustness, and performance during data processing.	2024-12-28 06:42:28 +00:00
Dylan Knutson	c791203d1c	dockerfile for building release app	2024-12-28 05:01:10 +00:00
Dylan Knutson	66165a7eee	batch loading for computed rows	2024-12-28 04:40:09 +00:00
Dylan Knutson	9aece9c740	make libmf multithreading work	2024-12-28 04:19:00 +00:00
Dylan Knutson	2738b8469b	cargo clippy	2024-12-28 03:46:30 +00:00
Dylan Knutson	6ebbd6aaa9	better visualization	2024-12-28 03:39:24 +00:00
Dylan Knutson	ab5f379b94	use cluster affinities	2024-12-28 03:32:38 +00:00
Dylan Knutson	9b4316e819	different way of giving clusters an x, y, z	2024-12-28 03:11:37 +00:00
Dylan Knutson	32a7292481	more fixes	2024-12-28 03:04:50 +00:00
Dylan Knutson	56b6604142	improve embedding visualization	2024-12-28 02:09:32 +00:00
Dylan Knutson	e21541af46	embeddings visualization	2024-12-28 01:59:11 +00:00
Dylan Knutson	61b9728fd8	better test data generation	2024-12-28 01:51:33 +00:00
Dylan Knutson	00b30ac285	cluster validation	2024-12-28 01:46:48 +00:00
Dylan Knutson	f7bb5b0cdd	initial commit	2024-12-28 01:28:33 +00:00

24 Commits