Commit Graph

21 Commits

Author SHA1 Message Date
Dylan Knutson
5430fdd501 remove float[] array support, only use vector 2024-12-28 19:58:46 +00:00
Dylan Knutson
75e7a4538d Refactor Dockerfile and add fit_model functionality
- Updated the Dockerfile to rename the built binary from `mf-fitter` to `fit_model` for clarity.
- Introduced a new `fit_model_args.rs` file to define command-line arguments for the fit model process, including parameters for matrix factorization.
- Added `pg_types.rs` and `pgvector.rs` files to handle PostgreSQL type interactions and vector serialization/deserialization.
- Implemented the main logic for the fit model in `fit_model.rs`, including data loading, model training, and embedding saving.
- Enhanced `visualize_embeddings.rs` to load embeddings and clusters more efficiently.

These changes improve the organization and functionality of the model fitting process, making it more intuitive and maintainable.
2024-12-28 19:50:24 +00:00
Dylan Knutson
bc88c54cb0 write vector binary type to database 2024-12-28 19:04:43 +00:00
Dylan Knutson
2b1865f3d4 use COPY for exporting data into temp table 2024-12-28 18:32:18 +00:00
Dylan Knutson
c4e79a36f9 Add argument parsing for data loading configuration
- Introduced a new `args.rs` file to define command-line arguments for data loading parameters, including source and target table details, matrix factorization settings, and optional interaction limits.
- Refactored `main.rs` to utilize the new argument structure, enhancing code organization and readability.
- Removed the previous inline argument definitions, streamlining the main application logic.

These changes improve the configurability and maintainability of the data loading process.
2024-12-28 18:16:39 +00:00
Dylan Knutson
428ca89c92 use COPY for importing data 2024-12-28 17:55:56 +00:00
Dylan Knutson
857cbf5d1f add max interactions flag 2024-12-28 17:41:42 +00:00
Dylan Knutson
350c61c313 Refactor data loading and embedding saving process
- Updated `.cargo/config.toml` to optimize compilation flags for performance.
- Enhanced `main.rs` by:
  - Renaming user and item ID columns for clarity.
  - Adding validation functions to ensure the existence of tables and columns in the database schema.
  - Implementing immediate exit handling during data loading.
  - Modifying the `save_embeddings` function to accept item IDs for processing.
  - Improving error handling with context messages for database operations.

These changes improve code readability, robustness, and performance during data processing.
2024-12-28 06:42:28 +00:00
Dylan Knutson
c791203d1c dockerfile for building release app 2024-12-28 05:01:10 +00:00
Dylan Knutson
66165a7eee batch loading for computed rows 2024-12-28 04:40:09 +00:00
Dylan Knutson
9aece9c740 make libmf multithreading work 2024-12-28 04:19:00 +00:00
Dylan Knutson
2738b8469b cargo clippy 2024-12-28 03:46:30 +00:00
Dylan Knutson
6ebbd6aaa9 better visualization 2024-12-28 03:39:24 +00:00
Dylan Knutson
ab5f379b94 use cluster affinities 2024-12-28 03:32:38 +00:00
Dylan Knutson
9b4316e819 different way of giving clusters an x, y, z 2024-12-28 03:11:37 +00:00
Dylan Knutson
32a7292481 more fixes 2024-12-28 03:04:50 +00:00
Dylan Knutson
56b6604142 improve embedding visualization 2024-12-28 02:09:32 +00:00
Dylan Knutson
e21541af46 embeddings visualization 2024-12-28 01:59:11 +00:00
Dylan Knutson
61b9728fd8 better test data generation 2024-12-28 01:51:33 +00:00
Dylan Knutson
00b30ac285 cluster validation 2024-12-28 01:46:48 +00:00
Dylan Knutson
f7bb5b0cdd initial commit 2024-12-28 01:28:33 +00:00