A large language model (LLM) is a type of Machine Learning model that utilizes a Neural Network architecture. A LLM is designed to understand and generate natural language text. A LLM is trained on large amounts of text data to make it learn the complexities of language patterns, grammar rules, and semantic relationships. Llama 2 is an open source generative AI large language model available in 3 sizes with 7 Billion, 13 Billion and 17 Billion parameters. All the Llama models are available on the Hugging Face platform which is a website which hosts open source machine learning models. NousResearch evaluated the Llama-2-70B model using four A100 40GB GPU setups with two prompt lengths: 1500 input tokens and 100 output tokens, and 50 input tokens and 500 output tokens. The assessment revealed the model’s capabilities, with latency best at 7.4 seconds and 33 seconds for the respective configurations. Throughput peaked at 1.1 RPS for the former and 0.8 RPS for the latter. All the new LLM advancements like Ghost Attention, In-Context Temperature re-scaling, Grouper query attention, etc have been incorporated in Llama 2. Its tokenizer uses a byte pair encoding algorithm. The standard transformer architecture, with pre-normalization using RMSNorm, and rotary positional embedding has been utilized in Llama 2 LLM. The model performs well on several tasks like coding, commonsense reasoning as well as Q&A tasks. The Llama 2 Chat version has also outperformed other open source models by a good margin. The working of the Llama 2 LLM has been described through a flowchart as in Fig4. An environment variable named ”REPLICATE_API_TOKEN” was created to hold the Replicate AI’s API token. This token is used for authentication purposes when requests are being made to the Replicate API from this application. A Replicate client object was created to which the variable containing the API token was provided. The Llama 2 Chat model with 70 billion parameters is called using this client object and the created query for generating python code is provided to it as a prompt. The output returned by the Llama 2 model is stored in a python object that can be iterated.
Fig 4. Detailed Working of Llama 2 Chat LLM

Extraction of Structured Python Code from Llama2 LLM Output

The python object that has the output code generated by the Llama 2 70B Chat LLM is iterated using a loop construct and all the python code from the Llama output is stored in a variable as a string.

Integration of User Interface with OCR and LLM

Functions created to extract the text from the flowchart image, to create the structured query for the LLM, to send the structured query as a prompt to the Llama 2 Chat LLM and extract the python code and its explanation from the LLM output were put into a single python module that was used with the User Interface code. When the user uploads an image of a flowchart on the UI, the “generate” function that we have created is called and the uploaded flowchart image is provided to the function as an input, then the function extracts the path of the uploaded image and then makes a call to the read_image function from the custom utils module which converts the image into an array. This image array is provided to the detect_text function to extract the text from the flowchart.The extracted flowchart text is converted into a structured python code generation query using the create_query function which is used as a prompt for the Llama 2 LLM. The Llama LLM is accessed using the replicate API token and then it is prompted with the code generation query. The output containing the python code for the flowchart and its brief explanation produced by the Llama 2 LLM is stored in a python object which is iterated to extract the python code and its explanation. The python code and its explanation is appended to a blank string using the python_code_string function and the python code for the uploaded flowchart and its explanation is returned as a string to the user interface. The flowchart image and its equivalent python code are displayed as output on the user interface. The users can copy the generated python code and execute it on the Integrated development environment or a code editor of their choice.