fenn's dream wiki database for keeping track of AI models etc
---

assemble a pile of documents:
- web page scrape
- huggingface metadata
- papers
- github pages
- torch/GGUF metadata

an LLM goes through a pile of documents relevant to a particular model. for each fact it:
- extracts a relevant piece of metadata about the model.
- finds a good spot for it in the wiki page
- attaches a source citation reference and correctness attestation in the wiki markup.
- large documents can be KV cached to speed up this process. the metadata type we're looking for goes at the end of the prompt.
- json schema driven with SG-lang? https://lmsys.org/blog/2024-02-05-compressed-fsm/ otherwise outlines FSM and llama.cpp supports grammars

each model should have the following metadata:
- name
- train type {base model, heavy tune, fine tune, merge}
- distribution type {full model, LoRA, API}
- license restrictions (hover icon for description)
- size (passive and active parameter count, minimum functional VRAM requirement in practice)
- intended hardware class {CPU, GPU, NPU, distributed, analog, brain tissue, etc.}
- use cases {QA, RAG, tool use, agent, code, writing, RP, ERP, chat, meta, vision, image, hearing, speech, music, audio, video, avatar, face recognition, face generation, emotion, 3d, protein, robot, ...}
- file urls (huggingface)
- code urls (github)
- demo urls (HF space)
- first and last publish date
- author / org / group
- model-specific paper:
  - urls in superscript
  - title
  - hover for abstract, click for pdf link
  - what to do about the ridiculous number of authors on some papers?
- training datasets:
  - name
  - hover for description, click for link if one exists
- prompt format templates
- ancestors:
  - via dataset ancestry
  - via fine tuning
  - via heavy tuning
  - via merging
  - via distillation
  - as a sequel (e.g. llama 1 2 3)
  - when should versions have separate pages?
- benchmarks:
  - perplexity
  - alpacaeval
  - lmsys elo
  - chai elo
  - arena hard
  - modality-specific benchmarks
- notes on personality
- why this model was an advance


attestation view mode:
- each attestation adds a 1/n weight
- ignore list for bad bots or humans (keep track of everyone's ignore lists for mod action)
  dark mode:
    - hues text cyan for bot-attested data
    - hues text yellow for human-attested data
    - text gets brighter the more attestations it has
  light mode:
    - hues text blue for bot-attested data
    - hues text brown for human-attested data
    - text gets darker the more attestations it has

reviews:
- date of evaluation
- the exact url used for the review
- task the model was evaluated on
- performance on the task (text field, 1-5 stars)
- bugs and annoyances
- solutions to bugs and annoyances
- comment on review
- attestation on review (me too)
- ability for community to close reviews as no longer valid (grayed out) and the reason this is true
- filter reviews by spec

a list of groups and organizations:
- urls, key people
- org type {corporation, startup, academic, NGO, collective, individual}
- goals, e.g. "make anime real"
  - reading between the lines is permitted

search that can be filtered per spec:
- spec list is itself auto constructed from search results (or grayed out irrelevant specs)
- spec filter as the primary search affordance
- export search as anki deck, select metadata fields to include in the deck

a giant table of all the GPUs that can be sorted and filtered per spec:
- same deal

problems:
- what happens when FFN makes VRAM mostly obsolete and we need shedloads of system RAM and PCIe instead
- what happens when crypto anarchists need to become illegible and are put at risk by the wiki itself
- how to not keep track of every useless little hobby project