Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 20 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,28 @@ In brief, an RGB image is given as in input to the neural net model, which perfo

We can measure the accuracy of the process by then using the [cv2 projectPoints](https://docs.opencv.org/2.4/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html#projectpoints) function to plot the keypoints of 3D object according to our rotation and translation vectors. This generated set of keypoints can then be measured against the ground truth values to generate error metrics over a set of predictions.

I created a [youtube series](https://www.youtube.com/playlist?list=PL3om9a5CvNUl-ZUvZLS8z66uc0qOxIEqj) which goes into the design process for each of these files, and also describes the final results of the algorithm, along with the performance effects of models used, an alternate 3d label set, pruning hypotheses predictions, and the number of keypoint hypotheses considered. Also included in the repo is a ```demo.ipynb``` python notebook, which contains a demonstration of the pipeline process from the input, to the final prediction and error metrics, with a focus on the data objects used throughout the process.
I created a [youtube series](https://www.youtube.com/playlist?list=PL3om9a5CvNUl-ZUvZLS8z66uc0qOxIEqj) which goes into the design process for each of these files, and also describes the final results of the algorithm, along with the performance effects of models used, an alternate 3d label set generated using farthest-point-sampling, pruning hypotheses predictions, and the number of keypoint hypotheses considered. Also included in the repo is a ```demo.ipynb``` python notebook, which contains a demonstration of the pipeline process from the input, to the final prediction and error metrics, with a focus on the data objects used throughout the process.

## Instructions

## Quick Start

Download the dataset, copy the environment via conda and train a model.

`git clone https://github.com/sgawalsh/stvNet`<br/>
`cd stvNet`<br/>

Download dataset from https://www.kaggle.com/datasets/sgawalsh/linemod-imagesmasks-ply3d-keypoints2d-labels/data and move to `stvNet/LINEMOD`

`conda env create -f environment.yml `<br/>
`conda activate stvNet`<br/>
`python models.py` # Train a model<br/>
`python pipeLine.py`# See model prediction projections<br/>

Then open:
- **Frontend**: http://localhost:8080
- **Grafana**: http://localhost:3000

### Custom Model

To build a custom model, construct the model within a function in the ```models.py``` file. The output of the model must match the models function (e.g. 18 outputs for vector prediction, 1 output for class prediction, or a combined output)
Expand All @@ -40,7 +58,7 @@ If the ```saveAccuracy``` boolean was set to ```True``` in ```evalModels```, the

The neural net model is trained to detect either the 2d coordinates of a set of 3d object keypoints on an image, the pixels associated with an object of interest, or both. The functions used to generate the target data for the neural nets is found in the `data.py` file. (`coordsTrainingGenerator`, `classTrainingGenerator`, `combinedTrainingGenerator`)

These functions read data from the [LINEMOD](https://bop.felk.cvut.cz/datasets/) dataset, one of several datasets used in academic works in 6d pose estimation. I show the folder and talk about the data format in [this](https://www.youtube.com/watch?v=wbTdqlBXOOE) video in the youtube series, but did not include the folder in this repo due to size constraints. The dataset contains a folder for each object of interest, and within that folder, there is a `JPEGImages` folder, a `labels` folder, and a `mask` folder. `JPEGImages` contains the RGB images which are converted to numpy arrays and used as the input data for the neural net.
A new, pre-formatted dataset has been added [here](https://www.kaggle.com/datasets/sgawalsh/linemod-imagesmasks-ply3d-keypoints2d-labels) on kaggle which provides 3D keypoint values, and 2D keypoint projections for all LINEMOD objects, separated by object category. Originally this project used the [LINEMOD](https://bop.felk.cvut.cz/datasets/) dataset, one of several datasets used in academic works in 6d pose estimation. I show the folder and talk about the data format in [this](https://www.youtube.com/watch?v=wbTdqlBXOOE) video in the youtube series, but did not include the folder in this repo due to size constraints. The dataset contains a folder for each object of interest, and within that folder, there is a `JPEGImages` folder, a `labels` folder, and a `mask` folder. `JPEGImages` contains the RGB images which are converted to numpy arrays and used as the input data for the neural net.

The mask folder contains a corresponding set of images that are made up of black pixels for pixels not associated with the object of interest, or white pixels for pixels associated with the object of interest. A (HxWx1) array is generated indicating whether a pixel belongs to the object of interest, which is used as target data for the class and combined generators.

Expand Down
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ google-pasta==0.2.0
grpcio==1.80.0
h5py==3.16.0
idna==3.11
keras==2.10.0
keras==3.13.2
Keras-Preprocessing==1.1.2
kiwisolver @ file:///D:/bld/bld/rattler-build_kiwisolver_1773067061/work
libclang==18.1.1
Expand Down