Absurd: Running an LLM locally

20231205

So, if you want to run an LLM locally for whatever reason, it's not too hard to do. Essentially, you first need to grab Llama.cpp, compile it, then grab a model (and possibly convert it), and then run it. Before starting, make sure you have a C/C++ compiler toolchain, python 3, and numpy. You may also encounter memory locking errors when attempting to run the model, and for this, you will need to remove the memlock limit in /etc/security/limits.conf:

*        hard    memlock        unlimited
*        soft    memlock        unlimited

If you'd rather not use mlock in this wildwest fashion, you can omit that option from the ./main command. In my usage on a 64 thread EPYC CPU with 256GB of RAM and 2x 2TB NVMe in BTRFS RAID 1, this had very little effect anyway. The system was totally CPU bound.

llama2 running on an EPYC 7551P, btop screenshot

Grabbing the program and compiling:

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp/
make -j$(nproc)

As for the models, you have a ton to choose from. You can grab models from HuggingFace, but I am going to start with a 70 billion parameter Llama 2 chat model:

wget https://huggingface.co/TheBloke/Llama-2-70B-chat-GGML/resolve/main/llama-2-70b-chat.ggmlv3.q4_1.bin

Now, we need to convert the model:

python3 ./convert-llama-ggml-to-gguf.py --input llama-2-70b-chat.ggmlv3.q4_1.bin --output models/70b.model.gguf --gqa 8 --eps 1e-5

If you didn't encounter any errors, you should now be able to run the model:

./main -t 64 --mlock -m ./models/70b.model.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "### Instruction: What is the answer to the ultimate question of life, the universe, and everything?\n### Response:"

And in my case, the machine responded with: "The answer to the ultimate question of life, the universe, and everything is 42, according to Douglas Adams' science fiction series "The Hitchhiker's Guide to the Galaxy." In the book, a group of hyper-intelligent beings build a massive supercomputer named Deep Thought to find the answer to the ultimate question. After seven and a half million years of calculations, Deep Thought finally reveals that the answer is 42. However, it turns out that the answer doesn't actually make sense without knowing what the ultimate question is. Therefore, the search for the answer continues, and the number 42 has become a popular meme and cultural reference symbolizing the search for meaning and answers to life's biggest questions."

If you need a smaller model (you probably do), and you do not wish to deal with conversion of your model, you can use something like:

wget https://huggingface.co/TheBloke/Orca-2-13B-GGUF/resolve/main/orca-2-13b.Q8_0.gguf
mv orca-2-13b.Q8_0.gguf models/
./main -t 64 --mlock -m models/orca-2-13b.Q8_0.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "### Instruction: What is the answer to life the universe and everything?\n### Response:"

For this model, I got the response of: The answer to the ultimate question of life, the universe, and everything is 42.

For an even smaller model, you could try wget https://huggingface.co/TheBloke/OpenOrca-Zephyr-7B-GGUF/resolve/main/openorca-zephyr-7b.Q8_0.gguf. This one actually performed really well in my testing for code generation and explanation.

Example output for the prompt: Binary tree sort in Ruby:


This is an example of how you can implement binary tree sort in Ruby.

First, let's define a class for our binary trees called BinaryTree. It will have the following methods:

  1. insert(value) - Inserts the given value into the tree and balances it if needed.
  2. search(value) - Returns true if the given value is found in the tree, false otherwise.
  3. delete(value) - Removes the given value from the tree if present.
  4. preorder_traversal - Prints the values in the tree using preorder traversal (root-left-right).
  5. inorder_traversal - Prints the values in the tree using inorder traversal (left-root-right).
  6. postorder_traversal - Prints the values in the tree using postorder traversal (left-right-root).
  7. max_value - Returns the maximum value in the tree.
  8. min_value - Returns the minimum value in the tree.

Here's the code for BinaryTree:

ruby
class BinaryTree
  attr_accessor :value, :left, :right

  def initialize(value)
    @value = value
    @left = nil
    @right = nil
  end

  def insert(value)
    if value < @value
      if !@left
        @left = BinaryTree.new(value)
      else
        @left.insert(value)
      end
    elsif value > @value
      if !@right
        @right = BinaryTree.new(value)
      else
        @right.insert(value)
      end
    else
      raise "Value already exists in the tree."
    end
  end

  def search(value)
    return true if (@left && @left.search(value)) || (@right && @right.search(value))
    return false if !(@left || @right)
    return @value == value
  end

  def delete(value)
    if @value == value
      if @left && @right
        min_right = @right.min_value
        if @left.search(min_right)
          replace_with_min(@left, min_right)
        else
          replace_with_min(@right, min_right)
        end
      elsif !@left && !@right
        NIL = nil
        return NIL
      elsif @left
        return @left
      elsif @right
        return @right
      end
    else
      if value < @value
        if !@left || [email protected](value)
          return self
        else
          return self.replace_with_min(@left, value)
        end
      else
        if !@right || [email protected](value)
          return self
        else
          return self.replace_with_max(@right, value)
        end
      end
    end
  end

  def preorder_traversal
    puts @value
    if @left
      @left.preorder_traversal
    end
    if @right
      @right.preorder_traversal
    end
  end

  def inorder_traversal
    if @left
      @left.inorder_traversal
    end
    puts @value

    end
    if @right
      @right.preorder_traversal
    end
  end

  def postorder_traversal
    if @left
      @left.postorder_traversal
    end
    if @right
      @right.postorder_traversal
    end
    puts @value
  end

  def max_value
    return @value if !@right
    return @right.max_value
  end

  def min_value
    return @value if !@left
    return @left.min_value
  end

  private

  def replace_with_min(node, value)
    new_node = node
    while (new_node && new_node.left)
      new_node = new_node.left
    end
    new_node.left = node.delete(value)
  end

  def replace_with_max(node, value)
    new_node = node
    while (new_node && new_node.right)
      new_node = new_node.right
    end
    new_node.right = node.delete(value)
  end
end

Now let's implement the binary tree sort using the BinaryTree class:

  1. First, create a method called sorted_values in your main program (not inside the BinaryTree class). This method will return an array of sorted values.
  2. Initialize an empty array called sorted_values.
  3. Traverse the binary tree using preorder traversal and append each value to the sorted_values array.
  4. Return the sorted_values array.

Here's how you can use this method:

ruby
tree = BinaryTree.new(5)
tree.insert(3)
tree.insert(7)
tree.insert(1)
tree.insert(4)
tree.insert(6)
tree.insert(8)
sorted_values = tree.sorted_values
p sorted_values

Output:

[1, 3, 4, 5, 6, 7, 8]

This code sorts the given binary tree and returns an array of its sorted values in ascending order. In this example, the sorted_values will be [1, 3, 4, 5, 6, 7, 8]. The preorder_traversal method is used to traverse the tree and collect the values in the correct order.

Note: The implementation provided here assumes that each value in the binary tree is unique. If you have duplicates, you may need to modify the code for search or delete operations.

I hope this helps! Let me know if you have any questions.


Needless to say, I truly didn't expect a small model to have output of such high quality... This was truly shocking. Yet, for any of these models, having a GPU is definitely a huge benefit and time saver. Running on a headless server, even quite powerful, with no GPU or other AI accelerator is... painful.

Final note, this is the smallest model that I saw give decent output to a variety of prompt types tinyllama-1.1B-1T-OpenOrca-Q8. This model gave me excellent results, and I tossed it in the web front end:

[Unit]
Description=Llama.cpp
Requires=network.target
[Service]
Type=simple
User=goober
Group=nobody
WorkingDirectory=/home/goober/llama.cpp
ExecStart=/home/bastion/llama.cpp/server --numa --mlock -m models/tinyllama-1.1b-1t-openorca.Q8_0.gguf -c 8192 --host 0.0.0.0 --log-disable
TimeoutSec=30
RestartSec=15s
Restart=always
[Install]
WantedBy=multi-user.target

Have fun!

⇠ back

© MMXXV, Abort Retry Fail LLC
Licentiam Absurdum