This story was originally published on HackerNoon at:
https://hackernoon.com/run-llama-without-a-gpu-quantized-llm-with-llmware-and-quantized-dragon.
Use AI miniaturization to get high-level performance out of LLMs running on your laptop!
Check more stories related to machine-learning at:
https://hackernoon.com/c/machine-learning.
You can also check exclusive content about
#llm,
#chatgpt,
#quantization,
#rag,
#python,
#mlops,
#gpu-infrastructure,
#hackernoon-top-story,
#hackernoon-es,
#hackernoon-hi,
#hackernoon-zh,
#hackernoon-fr,
#hackernoon-bn,
#hackernoon-ru,
#hackernoon-vi,
#hackernoon-pt,
#hackernoon-ja,
#hackernoon-de,
#hackernoon-ko,
#hackernoon-tr, and more.
This story was written by:
@shanglun. Learn more about this writer by checking
@shanglun's about page,
and for more stories, please visit
hackernoon.com.
As GPU resources become more constrained, miniaturization and specialist LLMs are slowly gaining prominence. Today we explore quantization, a cutting-edge miniaturization technique that allows us to run high-parameter models without specialized hardware.