STAT 29000: Project 8 — Spring 2022
Motivation: Python is an interpreted language (as opposed to a compiled language). In a compiled language, you are (mostly) unable to run and evaluate a single instruction at a time. In Python (and R — also an interpreted language), we can run and evaluate a line of code easily using a repl. In fact, this is the way you’ve been using Python to date — selecting and running pieces of Python code. Other ways to use Python include creating a package (like numpy, pandas, and pytorch), and creating scripts. You can create powerful CLI’s (command line interface) tools using Python. In this project, we will explore this in detail and learn how to create scripts that accept options and input and perform tasks.
Context: This is the first (of two) projects where we will learn about creating and using Python scripts.
Scope: Python
Dataset(s)
The following questions will use the following dataset(s):
-
/depot/datamine/data/coco/unlabeled2017/*.jpg
Questions
Question 1
Up until this point, we have been using Python from within the Jupyter Lab environment. It is pretty straightforward to use — type in some code into a cell and click run. This is similar to using a REPL (read eval print loop), in that you are essentially running one or more lines of Python code at once. Another way to run Python code is in a script form. A Python script is a .py
file with Python code written inside of it that performs a series of actions.
Python scripts are easy to use and run. For example, if you had a script called my_script.py
, you could run the script by executing the following in a terminal.
python3 my_script.py
This would use the first python3
interpreter in your $PATH
environment variable to run the script.
Read this article about the main
function in Python scripts. Also, please read this section, paying special attention to the shebang. Lastly, read this section.
In a Python cell in your notebook, determine which Python interpreter is being used by running the following code.
import sys
print(sys.executable)
If you want to find where Python looks for packages, you can use the following code.
Python will look for packages in those locations, in order. |
Your output should read: /scratch/brown/kamstut/tdm/apps/jupyter/kernels/f2021-s2022/.venv/bin/python
. This is the absolute path to the Python interpreter we use with this course.
Create a Python script called question01.py
in your $HOME
directory with the following content.
import sys
def main():
print(f"{sys.executable}")
if __name__ == "__main__":
main()
Open a terminal in Jupyter Lab, and run the following command to give execute permissions to the script.
chmod +x $HOME/question01.py
Finally, in the terminal, execute the script by running the following command.
python3 $HOME/question01.py
In addition, execute the same command in a bash cell in your notebook.
%%bash
python3 $HOME/question01.py
What is the output? Was this expected? Write 1-2 sentences explaining why the output was expected or not.
-
Code used to solve this problem.
-
Output from running the code.
-
The
question01.py
script.
Question 2
In the previous question, we ran the script, question01.py
, by passing it as input to our python3
interpreter. We learned that, depending on the first python3
interpreter in our $PATH
environment variable, the output would differ. There is another way to run Python scripts that don’t pass our script as an input. The way to do this is with the shebang. You should have read about the shebang in the linked articles in quesiton (1).
If a shebang is included in the first line of your Python script, there is no longer a need to pass the script as input. Instead, you can simply execute the script by running ./question01.py
. This tells the terminal to execute question01.py
in the current directory. If you were to simply type question01.py
in your terminal, you would get an error saying that the command "question01.py" was not found. The reason is that your shell will look in the directories in your $PATH
environment variable in order to find commands. Since question01.py
is not in your $PATH
, you receive that error. By prepending the ./
, this tells the shell to instead look in my current working directory and execute question01.py
.
How does the shell know how to execute the script? The shebang! For example, try executing the following script by running ./question02.py
(or $HOME/question02.py
) in the terminal.
#!/usr/bin/python3
import sys
def main():
print(f"{sys.executable}")
if __name__ == "__main__":
main()
chmod +x $HOME/question02.py
You’ll notice that the output matches the shebang! Now here is the real test, execute the script from within a bash cell in your notebook.
%%bash
$HOME/question02.py
Aha! Even though we are in our notebook, the shell respected the shebang and used /usr/bin/python3
to execute the script.
If you were to run |
Okay, in this project, since the focus is on writing Python scripts, it is a good opportunity to have some fun and use some powerful, pre-built models to do fun things. The following code will use an image classification model to identify the content of a photo.
#!/usr/bin/python3
from transformers import ViTFeatureExtractor, ViTForImageClassification
from PIL import Image
import requests
def main():
image = Image.open("/depot/datamine/data/coco/unlabeled2017/000000000008.jpg")
feature_extractor = ViTFeatureExtractor.from_pretrained('google/vit-base-patch16-224')
model = ViTForImageClassification.from_pretrained('google/vit-base-patch16-224')
inputs = feature_extractor(images=image, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits
predicted_class_idx = logits.argmax(-1).item()
print("Predicted class:", model.config.id2label[predicted_class_idx])
if __name__ == "__main__":
main()
chmod +x $HOME/question02_2.py
In a bash cell, execute the script.
%%bash
$HOME/question02_2.py
What happens? Did you expect this? Write 1-2 sentences explaining why you expected the output to be different. Finally, correct the script so that it runs correctly. In another bash cell, run the updated script.
Please ignore any red warnings you receive as a part of your output from running the corrected |
If you want to see whether or not the results are accurate, you can display the image with the following code inside a code cell.
|
-
Code used to solve this problem.
-
Output from running the code.
-
question02.py
script. -
question02_2.py
script.
Question 3
I hope that the previous two questions gave you a pretty solid understanding that if we want packages to be available when running a Python script, we need to use the appropriate shebang or Python interpreter.
Right now, our script, question02_2.py
is not very useful. No matter what we do it will continue to load up and analyze the same old image. That is pretty boring. One of the primary things you can do with scripts is read arguments passed to the script and do something. For example, we passed the argument question01.py
to the python3
program. In the same way, we could, for example, pass an absolute path to an image to our script and have it print out the output for the given image! Our script would be much more useful. We could do things like:
./my_script.py /depot/datamine/data/coco/unlabeled2017/000000000008.jpg
./my_script.py /depot/datamine/data/coco/unlabeled2017/000000000013.jpg
Copy your question02_2.py
script to a new script called question03.py
. Modify question03.py
to accept a single argument, an absolute path to an image, and use that argument in place of the default 000000000008.jpg
image. Use sys.argv
to accomplish this.
Test it out from within a bash cell in your notebook.
%%bash
$HOME/question03.py /depot/datamine/data/coco/unlabeled2017/000000000008.jpg
$HOME/question03.py /depot/datamine/data/coco/unlabeled2017/000000000013.jpg
$HOME/question03.py
If no argument is passed to |
Make sure to import
|
-
Code used to solve this problem.
-
Output from running the code.
Question 4
Typically, scripts or CLIs (command line interfaces) have a bunch of options. For example, you read about the various options of a tool like grep
or awk
by running the following in a terminal.
man awk
man grep
For example, grep
has the option, -i
which makes the search case insensitive.
grep -i 'ok' something.txt
In addition, often times options have both a long form or short form. For example grep -f somefile.txt
is the same as grep --file=somefile.txt
.
Create a new script called question04.py
that accepts a single argument called --detailed
or -d
, in either short form or long form. If the flag is present, instead of using the "google/vit-base-patch16-224" model, which outputs 1 of 1000 classes, it will instead use the "microsoft/beit-base-patch16-224-pt22k-ft22k" model (see here) that will output 1 of the 21841 classes. Some examples of how the script should work are below.
Make sure to give your script executable permissions.
|
%%bash
./question04.py /depot/datamine/data/coco/unlabeled2017/000000000008.jpg --detailed
./question04.py /depot/datamine/data/coco/unlabeled2017/000000000008.jpg -d
./question04.py --detailed /depot/datamine/data/coco/unlabeled2017/000000000008.jpg
./question04.py -d /depot/datamine/data/coco/unlabeled2017/000000000008.jpg
Predicted class: surfing, surfboarding, surfriding Predicted class: surfing, surfboarding, surfriding Predicted class: surfing, surfboarding, surfriding Predicted class: surfing, surfboarding, surfriding
-
Code used to solve this problem.
-
Output from running the code.
Question 5
As you can imagine, adding flags and options can blow up the logic and size of your script pretty quickly! Luckily, there is the argparse
package. This package takes care of parsing and handling program input. Here are the official docs and here is a decent tutorial on how to use it.
Use the argparse
package to parse your arguments instead of using sys.argv
. As long as the new version of your script accepts both the long and short forms of the --detailed
flag, and uses argparse
, you will receive full credit. However, please do you best to make the script as robust as possible! In addition, you can change the behavior of the arguments as long as the details flags work.
In bash cells, show at least 3 examples using your new script, question05.py
, which uses argparse
instead of sys.argv
.
If you want a solid template, you may use the following code to start. You will need to tweak it in order to make it work, however, it is a decent place to start.
An accompanying set of tests would be:
Output
Predicted class: surfing, surfboarding, surfriding Predicted class: seashore, coast, seacoast, sea-coast Predicted class: seashore, coast, seacoast, sea-coast |
-
Code used to solve this problem.
-
Output from running the code.
Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connect ion, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted. In addition, please review our submission guidelines before submitting your project. |