Training your ML model using Google AI Platform and Custom Environment containers

It is very similar to MovieLens ratings dataset but simplified for development purposes. You can apply this schema to anything including Google Analytics Page views or any other product/content related user activity.

Step 1. After you installed docker you need to authenticate it. Use gcloud as the credential helper for Docker:

gcloud auth configure-docker

Step 2. Create your Cloud Storage bucket and set your local environment variable:

export BUCKET_NAME="your_bucket_name"
export REGION=us-central1
gsutil mb -l $REGION gs://$BUCKET_NAME

Hint: Try doing everything in one project in the same region.

Step 3. Clone the repo.

cd Documnets/code/
git clone git@github.com:mshakhomirov/recommendation-trainer-customEnvDocker.git
cd recommendation-trainer/wals_ml_engine

Step 4. Write a dockerfile

Docker file is already there in this repo:

This bit is very important otherwise your instance won’t be able to save model to Cloud Storage:

# Make sure gsutil will use the default service account
RUN echo ‘[GoogleCompute]nservice_account = default’ > /etc/boto.cfg

With this docker file you will build an image with these custom environment dependences:

tensorflow==1.15
numpy==1.16.6
pandas==0.20.3
scipy==0.19.1
sh

These dependences versions is the main reason why I’m using custom container.

Google AI Platform’s runtime-version 1.15 has Tensorflow 1.15 but a different Pandas version which is not acceptable for my use case scenario where Pandas version must be 0.20.3.

Step 5. Build your Docker image.

export PROJECT_ID=$(gcloud config list project --format "value(core.project)")
export IMAGE_REPO_NAME=recommendation_bespoke_container
export IMAGE_TAG=tf_rec
export IMAGE_URI=gcr.io/$PROJECT_ID/$IMAGE_REPO_NAME:$IMAGE_TAGdocker build -f Dockerfile -t $IMAGE_URI ./

Test it locally:

docker run $IMAGE_URI

Output would be:

task.py: error: argument --job-dir is required

And this is alright because this image will be used as our custom environment where entry point is

“trainer/task.py”

For example, after we push our image we will be able to run this command locally:

gcloud ai-platform jobs submit training ${JOB_NAME} 
--region $REGION 
--scale-tier=CUSTOM 
--job-dir ${BUCKET}/jobs/${JOB_NAME} 
--master-image-uri $IMAGE_URI 
--config trainer/config/config_train.json 
--master-machine-type complex_model_m_gpu 
-- 
${ARGS}

and master-image-uri parameter will replace runtime-environment. Check mltrain.sh in the repo for more details.

Step 6. Push the image to docker repo

docker push $IMAGE_URI

Output should be:

The push refers to repository [gcr.io/<your-project>/recommendation_bespoke_container]

Step 7. Submit training job

Run the script included in the repo:

./mltrain.sh train_custom gs://$BUCKET_NAME data/ratings_small.csv — data-type user_ratings

Output:

Footer