A large language model (LLM) is a type of Machine Learning model that
utilizes a Neural Network architecture. A LLM is designed to understand
and generate natural language text. A LLM is trained on large amounts of
text data to make it learn the complexities of language patterns,
grammar rules, and semantic relationships. Llama 2 is an open source
generative AI large language model available in 3 sizes with 7 Billion,
13 Billion and 17 Billion parameters. All the Llama models are available
on the Hugging Face platform which is a website which hosts open source
machine learning models. NousResearch evaluated the Llama-2-70B model
using four A100 40GB GPU setups with two prompt lengths: 1500 input
tokens and 100 output tokens, and 50 input tokens and 500 output tokens.
The assessment revealed the model’s capabilities, with latency best at
7.4 seconds and 33 seconds for the respective configurations. Throughput
peaked at 1.1 RPS for the former and 0.8 RPS for the latter. All the new
LLM advancements like Ghost Attention, In-Context Temperature
re-scaling, Grouper query attention, etc have been incorporated in Llama
2. Its tokenizer uses a byte pair encoding algorithm. The standard
transformer architecture, with pre-normalization using RMSNorm, and
rotary positional embedding has been utilized in Llama 2 LLM. The model
performs well on several tasks like coding, commonsense reasoning as
well as Q&A tasks. The Llama 2 Chat version has also outperformed other
open source models by a good margin. The working of the Llama 2 LLM has
been described through a flowchart as in Fig4. An environment variable
named ”REPLICATE_API_TOKEN” was created to hold the Replicate AI’s API
token. This token is used for authentication purposes when requests are
being made to the Replicate API from this application. A Replicate
client object was created to which the variable containing the API token
was provided. The Llama 2 Chat model with 70 billion parameters is
called using this client object and the created query for generating
python code is provided to it as a prompt. The output returned by the
Llama 2 model is stored in a python object that can be iterated.
Fig 4. Detailed Working of Llama 2 Chat LLM
Extraction of Structured Python Code
from Llama2 LLM
Output
The python object that has the output code generated by the Llama 2 70B
Chat LLM is iterated using a loop construct and all the python code from
the Llama output is stored in a variable as a string.
Integration of User Interface with OCR
and LLM
Functions created to extract the text from the flowchart image, to
create the structured query for the LLM, to send the structured query as
a prompt to the Llama 2 Chat LLM and extract the python code and its
explanation from the LLM output were put into a single python module
that was used with the User Interface code. When the user uploads an
image of a flowchart on the UI, the “generate” function that we have
created is called and the uploaded flowchart image is provided to the
function as an input, then the function extracts the path of the
uploaded image and then makes a call to the read_image function from
the custom utils module which converts the image into an array. This
image array is provided to the detect_text function to extract the text
from the flowchart.The extracted flowchart text is converted into a
structured python code generation query using the create_query function
which is used as a prompt for the Llama 2 LLM. The Llama LLM is accessed
using the replicate API token and then it is prompted with the code
generation query. The output containing the python code for the
flowchart and its brief explanation produced by the Llama 2 LLM is
stored in a python object which is iterated to extract the python code
and its explanation. The python code and its explanation is appended to
a blank string using the python_code_string function and the python
code for the uploaded flowchart and its explanation is returned as a
string to the user interface. The flowchart image and its equivalent
python code are displayed as output on the user interface. The users can
copy the generated python code and execute it on the Integrated
development environment or a code editor of their choice.