Handy Technical References: How to take advantage of readily available API's with AI capabilities

Folks - in this article we would explore some very useful API's which some major companies investing in AI are providing.

Earlier post was in relation to establishing our own custom made AI & deep learning code base, this tutorial is more on understanding well established concepts in form of API's which are available to be used directly in the code.

Let's go through them - this post is covering the google apis - one by one -

1) Vision - check the images and get information about it.
2) Speech - provides automated responses based on context - response guessing.
3) Video - provide search capabilities within a video.
4) Natural Language - provides to understand the written feedback based on the context in which this was triggered.
5) Translation - provide a translation in relation to the context where you are converting to.

Initially if you just need to check them out and do some test trials - these are free but if you are using them heavily google does charge you for them, but good news - you will get some credit first time you start to use them.

How does it work?

Firebase hosting --> Cloud Storage --> Cloud Functions

--> API (in question) (e.g. - Vision API) --> Firebase Database

1) Vision or Image API is really cool - in the sense it scans an image and will provide the details whether this image is of a landmark or person, if person what is the mood, etc.

for e.g. - json response from an image api might look like -








{
  "responses": [
    {
      "faceAnnotations": [
        {
          ....
            {
              "type": "RIGHT_OF_LEFT_EYEBROW",
              "position": {
                "x": 965.15735,
                "y": 349.91434,
                "z": -7.9691405
              }
            }...

            {
              "type": "UPPER_LIP",
              "position": {
                "x": 960.88947,
                "y": 382.35114,
                "z": -15.794773
              }
         
            ....
            }
          ],
          "rollAngle": 16.3792967,
          "panAngle": -29.3338267,
          "tiltAngle": 4.45867656,
          "detectionConfidence": 0.980691,
          "landmarkingConfidence": 0.57905465,
          "joyLikelihood": "VERY_LIKELY",
          "sorrowLikelihood": "VERY_UNLIKELY",
          "angerLikelihood": "VERY_UNLIKELY",
          "surpriseLikelihood": "VERY_UNLIKELY",
          "underExposedLikelihood": "VERY_UNLIKELY",
          "blurredLikelihood": "VERY_UNLIKELY",
          "headwearLikelihood": "VERY_UNLIKELY"
        }
      ]
    }
  ]
}

So with this API you can get somewhat details about features or mood of the person, if a landmark or building you can get the details of the picture - where it is and likelihood of it being correct.

2.Video API -

This API can search within a video a particular frame which has that element in it - like for e.g. if there is a cricket match and you want to know when was a catch taken - you could just enter the details of the catch and it will tell in all videos at which point you will see the catch so you can jump to those locations instead of going through the whole video.

Sample response for label detection -

 "segmentLabelAnnotations1": [
              {
                "entity": {
                  "entityId": "/m/01yrx",
                  "languageCode": "en-US"
                },
                "segments1": [
                  {
                    "segment": {
                      "startTimeOffset": "0s",
                      "endTimeOffset": "14.833664s"
                    },

3. Speech API/Translation API/Natural language processing API

Speech api is used in combination with translation api & natural language processing api.

So for e.g. if someone sends a message text saying 'does this time suit you - 9 AM?' - the natural language api will understand the context and give a framed response option like for e.g. --> 'yes - it suits me' or 'no - it doesn't, i can suggest another time', the speech API gives you an option to record a response and translate that to text in terms of the context and translation will allow you to translate to another language - pretty cool, no need for interpreter or translator and response engine !

Sample response from natural language API -

  "sentences": [
    {
      "text": {
        "content": "Four score and seven years ago our fathers brought forth
        on this continent a new nation, conceived in liberty and dedicated to
        the proposition that all men are created equal.",
        "beginOffset": 0
      },
      "sentiment": {
        "magnitude": 0.8,
        "score": 0.8

PS - referred from google tutorials.

See the score and magnitude - this sets the tone of the text

How to get them installed and working?

You need to have the below in order to get the API's to work -

Set up node js

nvm install stable

run nvm

nvm

nvm alias default stable

install yarn

curl -o- -L https://yarnpkg.com/install.sh | bash

yarn --version

yarn add express

deploy app to gcloud

gcloud app deploy

PS: Most of examples are taken from google api docs but shortened for quick illustration.

Handy Technical References

Sunday, March 18, 2018

How to take advantage of readily available API's with AI capabilities

No comments:

Post a Comment