So, if you want to run an LLM locally for whatever reason, it's not too hard to do. Essentially, you first need to grab Llama.cpp, compile it, then grab a model (and possibly convert it), and then run it. Before starting, make sure you have a C/C++ compiler toolchain, python 3, and numpy. You may also encounter memory locking errors when attempting to run the model, and for this, you will need to remove the memlock limit in /etc/security/limits.conf
:
* hard memlock unlimited
* soft memlock unlimited
If you'd rather not use mlock in this wildwest fashion, you can omit that option from the ./main
command. In my usage on a 64 thread EPYC CPU with 256GB of RAM and 2x 2TB NVMe in BTRFS RAID 1, this had very little effect anyway. The system was totally CPU bound.
Grabbing the program and compiling:
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp/
make -j$(nproc)
As for the models, you have a ton to choose from. You can grab models from HuggingFace, but I am going to start with a 70 billion parameter Llama 2 chat model:
wget https://huggingface.co/TheBloke/Llama-2-70B-chat-GGML/resolve/main/llama-2-70b-chat.ggmlv3.q4_1.bin
Now, we need to convert the model:
python3 ./convert-llama-ggml-to-gguf.py --input llama-2-70b-chat.ggmlv3.q4_1.bin --output models/70b.model.gguf --gqa 8 --eps 1e-5
If you didn't encounter any errors, you should now be able to run the model:
./main -t 64 --mlock -m ./models/70b.model.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "### Instruction: What is the answer to the ultimate question of life, the universe, and everything?\n### Response:"
And in my case, the machine responded with: "The answer to the ultimate question of life, the universe, and everything is 42, according to Douglas Adams' science fiction series "The Hitchhiker's Guide to the Galaxy." In the book, a group of hyper-intelligent beings build a massive supercomputer named Deep Thought to find the answer to the ultimate question. After seven and a half million years of calculations, Deep Thought finally reveals that the answer is 42. However, it turns out that the answer doesn't actually make sense without knowing what the ultimate question is. Therefore, the search for the answer continues, and the number 42 has become a popular meme and cultural reference symbolizing the search for meaning and answers to life's biggest questions."
If you need a smaller model (you probably do), and you do not wish to deal with conversion of your model, you can use something like:
wget https://huggingface.co/TheBloke/Orca-2-13B-GGUF/resolve/main/orca-2-13b.Q8_0.gguf
mv orca-2-13b.Q8_0.gguf models/
./main -t 64 --mlock -m models/orca-2-13b.Q8_0.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "### Instruction: What is the answer to life the universe and everything?\n### Response:"
For this model, I got the response of: The answer to the ultimate question of life, the universe, and everything is 42.
For an even smaller model, you could try wget https://huggingface.co/TheBloke/OpenOrca-Zephyr-7B-GGUF/resolve/main/openorca-zephyr-7b.Q8_0.gguf
. This one actually performed really well in my testing for code generation and explanation.
Example output for the prompt: Binary tree sort in Ruby:
This is an example of how you can implement binary tree sort in Ruby.
First, let's define a class for our binary trees called BinaryTree
. It will have the following methods:
Here's the code for BinaryTree
:
ruby
class BinaryTree
attr_accessor :value, :left, :right
def initialize(value)
@value = value
@left = nil
@right = nil
end
def insert(value)
if value < @value
if !@left
@left = BinaryTree.new(value)
else
@left.insert(value)
end
elsif value > @value
if !@right
@right = BinaryTree.new(value)
else
@right.insert(value)
end
else
raise "Value already exists in the tree."
end
end
def search(value)
return true if (@left && @left.search(value)) || (@right && @right.search(value))
return false if !(@left || @right)
return @value == value
end
def delete(value)
if @value == value
if @left && @right
min_right = @right.min_value
if @left.search(min_right)
replace_with_min(@left, min_right)
else
replace_with_min(@right, min_right)
end
elsif !@left && !@right
NIL = nil
return NIL
elsif @left
return @left
elsif @right
return @right
end
else
if value < @value
if !@left || [email protected](value)
return self
else
return self.replace_with_min(@left, value)
end
else
if !@right || [email protected](value)
return self
else
return self.replace_with_max(@right, value)
end
end
end
end
def preorder_traversal
puts @value
if @left
@left.preorder_traversal
end
if @right
@right.preorder_traversal
end
end
def inorder_traversal
if @left
@left.inorder_traversal
end
puts @value
end
if @right
@right.preorder_traversal
end
end
def postorder_traversal
if @left
@left.postorder_traversal
end
if @right
@right.postorder_traversal
end
puts @value
end
def max_value
return @value if !@right
return @right.max_value
end
def min_value
return @value if !@left
return @left.min_value
end
private
def replace_with_min(node, value)
new_node = node
while (new_node && new_node.left)
new_node = new_node.left
end
new_node.left = node.delete(value)
end
def replace_with_max(node, value)
new_node = node
while (new_node && new_node.right)
new_node = new_node.right
end
new_node.right = node.delete(value)
end
end
Now let's implement the binary tree sort using the BinaryTree
class:
sorted_values
in your main program (not inside the BinaryTree
class). This method will return an array of sorted values.sorted_values
.sorted_values
array.sorted_values
array.Here's how you can use this method:
ruby
tree = BinaryTree.new(5)
tree.insert(3)
tree.insert(7)
tree.insert(1)
tree.insert(4)
tree.insert(6)
tree.insert(8)
sorted_values = tree.sorted_values
p sorted_values
Output:
[1, 3, 4, 5, 6, 7, 8]
This code sorts the given binary tree and returns an array of its sorted values in ascending order. In this example, the sorted_values will be [1, 3, 4, 5, 6, 7, 8]. The preorder_traversal
method is used to traverse the tree and collect the values in the correct order.
Note: The implementation provided here assumes that each value in the binary tree is unique. If you have duplicates, you may need to modify the code for search or delete operations.
I hope this helps! Let me know if you have any questions.
Needless to say, I truly didn't expect a small model to have output of such high quality... This was truly shocking. Yet, for any of these models, having a GPU is definitely a huge benefit and time saver. Running on a headless server, even quite powerful, with no GPU or other AI accelerator is... painful.
Final note, this is the smallest model that I saw give decent output to a variety of prompt types tinyllama-1.1B-1T-OpenOrca-Q8. This model gave me excellent results, and I tossed it in the web front end:
[Unit]
Description=Llama.cpp
Requires=network.target
[Service]
Type=simple
User=goober
Group=nobody
WorkingDirectory=/home/goober/llama.cpp
ExecStart=/home/bastion/llama.cpp/server --numa --mlock -m models/tinyllama-1.1b-1t-openorca.Q8_0.gguf -c 8192 --host 0.0.0.0 --log-disable
TimeoutSec=30
RestartSec=15s
Restart=always
[Install]
WantedBy=multi-user.target
Have fun!