C++: Huffman Coding using STL

Huffman Code compress data very effectively: saving of 20% to 90% are typical, depending on the characteristics of the data being compressed.Huffman's greedy algorithm uses a table giving how often each character occurs (i.e., its frequency) to build up an optimal way of representing each character as a binary string. This algorithm constructs an optimal prefix code called a Huffman Code.


Tree corresponding to optimal prefix code[Huffman Code]

Here is the program for Hufmman Coding

 
#include <iostream>
#include <vector>
#include <queue>

class HuffmanCodes
{
 struct Node
 {
  char data;
  size_t freq;
  Node* left;
  Node* right;

  Node()
  {
    data = '\0';
    freq = 0;
  }
  Node(char data, size_t freq) : data(data),
                                 freq(freq),
                                 left(nullptr),
                                 right(nullptr)
                                 {}
 ~Node()
 {
   delete left;
   delete right;
 }
 };

 struct compare
 {
  bool operator()(Node* l, Node* r)
  {
    return (l->freq > r->freq);
  }
};

Node* top;

void printCode(Node* root, std::string str)
{
  if(root == nullptr)
   return;

 if(root->data != '$')
 {
   std::cout << root->data << " : " << str << "\n";
 }
 printCode(root->left, str + "0");
 printCode(root->right, str + "1");
}

public:
  HuffmanCodes() {}
  ~HuffmanCodes()
  {
    delete top;
  }
  void GenerateCode(const std::vector<char>& data, const std::vector<size_t>& freq)
  {
   Node* left;
   Node* right;

   std::priority_queue<Node*, std::vector<Node*>, compare > minHeap;

   for(size_t i = 0; i < data.size(); ++i)
   {
      minHeap.push(new Node(data[i], freq[i]));
   }

    while(minHeap.size() != 1)
    {
      left = minHeap.top();
      minHeap.pop();

      right = minHeap.top();
      minHeap.pop();

      top = new Node('$', left->freq + right->freq);
      top->left  = left;
      top->right = right;
      minHeap.push(top);
     }
    printCode(minHeap.top(), "");
 }
};

 int main()
 {
  HuffmanCodes set1;
  std::vector<char> data({'d', 'e', 'b', 'c', 'a', 'f'});
  std::vector<size_t> freq({16, 9, 13, 12, 45, 5});
  size_t size = data.size();
  set1.GenerateCode(data, freq);

  return 0;
 }
 
You can download this code from HuffmanCoding.cpp 

We have used std::priority_queue  

A priority queue is a container adaptor that provides constant time lookup of the largest (by default) element, at the expense of logarithmic insertion and extraction. 

Since by default it is maximim priority queue but we want minimum priority queue. So we defined a compare

struct compare
 {
  bool operator()(Node* l, Node* r)
  {
    return (l->freq > r->freq);
  }
 };
 
You can read more about priority_queue from cppreference

The output of the program
 a : 0
c : 100
b : 101
f : 1100
e : 1101
d : 111
 
You should also read Maximum Priority Queue



                                                                  
                                                                  
You may also like
What is "namespace" and why do we use it?
C++: Implementation of Heapsort (Sorting)
How do I learn competitive programming as a beginner?
Which Linux distribution is the best for a programmer?
What programming languages do what?


 



Comments

Popular Posts

C++: Linked List containing Loop (Floyd Cycle finding Algorithm) program

C++: Breadth First Search program using Adjacency Matrix

Check if the String is Palindrome (C++ & Java Program)