fenn's dream wiki database for keeping track of AI models etc --- assemble a pile of documents: - web page scrape - huggingface metadata - papers - github pages - torch/GGUF metadata an LLM goes through a pile of documents relevant to a particular model. for each fact it: - extracts a relevant piece of metadata about the model. - finds a good spot for it in the wiki page - attaches a source citation reference and correctness attestation in the wiki markup. - large documents can be KV cached to speed up this process. the metadata type we're looking for goes at the end of the prompt. - json schema driven with SG-lang? https://lmsys.org/blog/2024-02-05-compressed-fsm/ otherwise outlines FSM and llama.cpp supports grammars each model should have the following metadata: - name - train type {base model, heavy tune, fine tune, merge} - distribution type {full model, LoRA, API} - license restrictions (hover icon for description) - size (passive and active parameter count, minimum functional VRAM requirement in practice) - intended hardware class {CPU, GPU, NPU, distributed, analog, brain tissue, etc.} - use cases {QA, RAG, tool use, agent, code, writing, RP, ERP, chat, meta, vision, image, hearing, speech, music, audio, video, avatar, face recognition, face generation, emotion, 3d, protein, robot, ...} - file urls (huggingface) - code urls (github) - demo urls (HF space) - first and last publish date - author / org / group - model-specific paper: - urls in superscript - title - hover for abstract, click for pdf link - what to do about the ridiculous number of authors on some papers? - training datasets: - name - hover for description, click for link if one exists - prompt format templates - ancestors: - via dataset ancestry - via fine tuning - via heavy tuning - via merging - via distillation - as a sequel (e.g. llama 1 2 3) - when should versions have separate pages? - benchmarks: - perplexity - alpacaeval - lmsys elo - chai elo - arena hard - modality-specific benchmarks - notes on personality - why this model was an advance attestation view mode: - each attestation adds a 1/n weight - ignore list for bad bots or humans (keep track of everyone's ignore lists for mod action) dark mode: - hues text cyan for bot-attested data - hues text yellow for human-attested data - text gets brighter the more attestations it has light mode: - hues text blue for bot-attested data - hues text brown for human-attested data - text gets darker the more attestations it has reviews: - date of evaluation - the exact url used for the review - task the model was evaluated on - performance on the task (text field, 1-5 stars) - bugs and annoyances - solutions to bugs and annoyances - comment on review - attestation on review (me too) - ability for community to close reviews as no longer valid (grayed out) and the reason this is true - filter reviews by spec a list of groups and organizations: - urls, key people - org type {corporation, startup, academic, NGO, collective, individual} - goals, e.g. "make anime real" - reading between the lines is permitted search that can be filtered per spec: - spec list is itself auto constructed from search results (or grayed out irrelevant specs) - spec filter as the primary search affordance - export search as anki deck, select metadata fields to include in the deck a giant table of all the GPUs that can be sorted and filtered per spec: - same deal problems: - what happens when FFN makes VRAM mostly obsolete and we need shedloads of system RAM and PCIe instead - what happens when crypto anarchists need to become illegible and are put at risk by the wiki itself - how to not keep track of every useless little hobby project