Files
redux-scraper/TODO.md
2025-07-08 00:11:24 +00:00

48 lines
3.1 KiB
Markdown

# Project TODO List
- [ ] Add bookmarking feature for posts across different domains
- [ ] Add search feature to search FA descriptions, tags, E621 descriptions, tags
- [x] Get inkbunny index scan job working
- [x] Attach logs to jobs, page to view jobs and their logs
- [ ] Standardize all the embeddings tables to use the same schema (item_id, embedding)
- [ ] Bluesky scraper
- [x] Download favs / votes for E621 users
- [ ] Automatically enqueue jobs for FA users to do incremental scans of profiles
- [ ] Fix FA posts that start with "Font size adjustment: smallerlarger"
- [ ] Convert logger .prefix=... into .tagged(...)
- [x] `make_tag` should be smart about the objects it takes
- [ ] Convert all `state: string` attributes to enums in ActiveRecord models
- [ ] Create `belongs_to_log_entry` macro for ActiveRecord models
- [x] Use StaticFileJobHelper for Domain::Fa::Job::ScanFileJob
- [ ] Unify HTTP client configs for all domains, so the same job type can be used for different domains
- [ ] put abstract `external_url_for_view` in a module
- [ ] backfill descriptions on inkbunny posts
- [ ] store deep update json on inkbunny posts
- [x] limit number of users, or paginate for "users who favorited this post" page
- [ ] manual good job runner does not indicate if the job threw an exception - check return value of #perform, maybe?
- [ ] FA user favs job should stop when in incremental mode when all posts on the page are already known favs (e.g. pages with only 47 posts are not a false positive)
- [x] Factor out FA listings page enqueue logic into common location; use in Gallery and Favs jobs
- [ ] Add followers / following to FA user show page
- [x] Parse E621 source url for inkbunny posts & users
- [x] Parse E621 source url for fa users
- [ ] Parse BBCode in post descriptions
- example post with bbcode: https://refurrer.com/posts/ib/3452498
- [ ] Show tags on fa posts, ib posts
- [ ] Sofurry implmentation
- [ ] Make unified Static file job
- [ ] Make unified Avatar file job
- [ ] ko-fi domain icon
- [ ] tumblr domain icon
- [ ] Do PCA on user factors table to display a 2D plot of users
- [ ] Use links found in descriptions to indicate re-scanning a post? (e.g. for comic next/prev links)
- [ ] fix for IDs that have a dot in them - e.g. https://refurrer.com/users/fa@jakke.
- [ ] Rich inline links to e621 e.g. https://refurrer.com/posts/fa@60070060
- [ ] Find FaPost that have favs recorded but no scan / file, enqueue scan
- [x] Bunch of posts with empty responses: posts = Domain::Post.joins(files: :log_entry).where(files: { http_log_entries: { response_sha256: BlobFile::EMPTY_FILE_SHA256 }}).limit(10)
- [ ] Create GlobalState entries for last FA id on browse page, periodic scan to scan from the newest FA ID to the stored one
- [ ] GlobalState entries for long running backfill jobs, automatically restart them if they fail
- [ ] Flag to pass to jobs to log HTTP requests / responses to a directory, HTTP mock helper to read from that directory
- [ ] fix IP address incorrect for Cloudflare proxied requests
- [ ] SOCKS5 proxy for additional workers
- [ ] Backup FA scraper using foxbot & g6jy5jkx466lrqojcngbnksugrcfxsl562bzuikrka5rv7srgguqbjid.onion