- Updated `fit_model_args.rs` to allow optional factors for matrix factorization and added an index name argument for index management.
- Modified `fit_model.rs` to handle index creation and dropping during data upsert, improving database interaction.
- Adjusted schema validation to infer vector dimensions and validate against specified factors.
- Enhanced `generate_test_data.rs` to create an IVFFlat index on the embeddings column.
These changes improve the flexibility and robustness of the fit model process, allowing for better management of database indices and more intuitive argument handling.
- Updated the Dockerfile to rename the built binary from `mf-fitter` to `fit_model` for clarity.
- Introduced a new `fit_model_args.rs` file to define command-line arguments for the fit model process, including parameters for matrix factorization.
- Added `pg_types.rs` and `pgvector.rs` files to handle PostgreSQL type interactions and vector serialization/deserialization.
- Implemented the main logic for the fit model in `fit_model.rs`, including data loading, model training, and embedding saving.
- Enhanced `visualize_embeddings.rs` to load embeddings and clusters more efficiently.
These changes improve the organization and functionality of the model fitting process, making it more intuitive and maintainable.
- Introduced a new `args.rs` file to define command-line arguments for data loading parameters, including source and target table details, matrix factorization settings, and optional interaction limits.
- Refactored `main.rs` to utilize the new argument structure, enhancing code organization and readability.
- Removed the previous inline argument definitions, streamlining the main application logic.
These changes improve the configurability and maintainability of the data loading process.
- Updated `.cargo/config.toml` to optimize compilation flags for performance.
- Enhanced `main.rs` by:
- Renaming user and item ID columns for clarity.
- Adding validation functions to ensure the existence of tables and columns in the database schema.
- Implementing immediate exit handling during data loading.
- Modifying the `save_embeddings` function to accept item IDs for processing.
- Improving error handling with context messages for database operations.
These changes improve code readability, robustness, and performance during data processing.