Folks - in this article we would explore some very useful API's which some major companies investing in AI are providing.
Earlier post was in relation to establishing our own custom made AI & deep learning code base, this tutorial is more on understanding well established concepts in form of API's which are available to be used directly in the code.
Let's go through them - this post is covering the google apis - one by one -
1) Vision - check the images and get information about it.
2) Speech - provides automated responses based on context - response guessing.
3) Video - provide search capabilities within a video.
4) Natural Language - provides to understand the written feedback based on the context in which this was triggered.
5) Translation - provide a translation in relation to the context where you are converting to.
Initially if you just need to check them out and do some test trials - these are free but if you are using them heavily google does charge you for them, but good news - you will get some credit first time you start to use them.
How does it work?
Firebase hosting --> Cloud Storage --> Cloud Functions
--> API (in question) (e.g. - Vision API) --> Firebase Database
1) Vision or Image API is really cool - in the sense it scans an image and will provide the details whether this image is of a landmark or person, if person what is the mood, etc.
for e.g. - json response from an image api might look like -
So with this API you can get somewhat details about features or mood of the person, if a landmark or building you can get the details of the picture - where it is and likelihood of it being correct.
2.Video API -
This API can search within a video a particular frame which has that element in it - like for e.g. if there is a cricket match and you want to know when was a catch taken - you could just enter the details of the catch and it will tell in all videos at which point you will see the catch so you can jump to those locations instead of going through the whole video.
Sample response for label detection -
3. Speech API/Translation API/Natural language processing API
Speech api is used in combination with translation api & natural language processing api.
So for e.g. if someone sends a message text saying 'does this time suit you - 9 AM?' - the natural language api will understand the context and give a framed response option like for e.g. --> 'yes - it suits me' or 'no - it doesn't, i can suggest another time', the speech API gives you an option to record a response and translate that to text in terms of the context and translation will allow you to translate to another language - pretty cool, no need for interpreter or translator and response engine !
Sample response from natural language API -
See the score and magnitude - this sets the tone of the text
How to get them installed and working?
You need to have the below in order to get the API's to work -
Set up node js
nvm install stable
run nvm
nvm
nvm alias default stable
install yarn
curl -o- -L https://yarnpkg.com/install.sh | bash
yarn --version
yarn add express
deploy app to gcloud
gcloud app deploy
PS: Most of examples are taken from google api docs but shortened for quick illustration.
Earlier post was in relation to establishing our own custom made AI & deep learning code base, this tutorial is more on understanding well established concepts in form of API's which are available to be used directly in the code.
Let's go through them - this post is covering the google apis - one by one -
1) Vision - check the images and get information about it.
2) Speech - provides automated responses based on context - response guessing.
3) Video - provide search capabilities within a video.
4) Natural Language - provides to understand the written feedback based on the context in which this was triggered.
5) Translation - provide a translation in relation to the context where you are converting to.
Initially if you just need to check them out and do some test trials - these are free but if you are using them heavily google does charge you for them, but good news - you will get some credit first time you start to use them.
How does it work?
Firebase hosting --> Cloud Storage --> Cloud Functions
--> API (in question) (e.g. - Vision API) --> Firebase Database
1) Vision or Image API is really cool - in the sense it scans an image and will provide the details whether this image is of a landmark or person, if person what is the mood, etc.
for e.g. - json response from an image api might look like -
{ "responses": [ { "faceAnnotations": [ { .... { "type": "RIGHT_OF_LEFT_EYEBROW", "position": { "x": 965.15735, "y": 349.91434, "z": -7.9691405 } }...
{
"type": "UPPER_LIP",
"position": {
"x": 960.88947,
"y": 382.35114,
"z": -15.794773
}
....
}
],
"rollAngle": 16.3792967,
"panAngle": -29.3338267,
"tiltAngle": 4.45867656,
"detectionConfidence": 0.980691,
"landmarkingConfidence": 0.57905465,
"joyLikelihood": "VERY_LIKELY",
"sorrowLikelihood": "VERY_UNLIKELY",
"angerLikelihood": "VERY_UNLIKELY",
"surpriseLikelihood": "VERY_UNLIKELY",
"underExposedLikelihood": "VERY_UNLIKELY",
"blurredLikelihood": "VERY_UNLIKELY",
"headwearLikelihood": "VERY_UNLIKELY"
}
]
}
]
}
So with this API you can get somewhat details about features or mood of the person, if a landmark or building you can get the details of the picture - where it is and likelihood of it being correct.
2.Video API -
This API can search within a video a particular frame which has that element in it - like for e.g. if there is a cricket match and you want to know when was a catch taken - you could just enter the details of the catch and it will tell in all videos at which point you will see the catch so you can jump to those locations instead of going through the whole video.
Sample response for label detection -
"segmentLabelAnnotations1": [
{
"entity": {
"entityId": "/m/01yrx",
"languageCode": "en-US"
},
"segments1": [
{
"segment": {
"startTimeOffset": "0s",
"endTimeOffset": "14.833664s"
},
3. Speech API/Translation API/Natural language processing API
Speech api is used in combination with translation api & natural language processing api.
So for e.g. if someone sends a message text saying 'does this time suit you - 9 AM?' - the natural language api will understand the context and give a framed response option like for e.g. --> 'yes - it suits me' or 'no - it doesn't, i can suggest another time', the speech API gives you an option to record a response and translate that to text in terms of the context and translation will allow you to translate to another language - pretty cool, no need for interpreter or translator and response engine !
Sample response from natural language API -
"sentences": [ { "text": { "content": "Four score and seven years ago our fathers brought forth on this continent a new nation, conceived in liberty and dedicated to the proposition that all men are created equal.", "beginOffset": 0 }, "sentiment": { "magnitude": 0.8, "score": 0.8PS - referred from google tutorials.
See the score and magnitude - this sets the tone of the text
How to get them installed and working?
You need to have the below in order to get the API's to work -
Set up node js
nvm install stable
run nvm
nvm
nvm alias default stable
install yarn
curl -o- -L https://yarnpkg.com/install.sh | bash
yarn --version
yarn add express
deploy app to gcloud
gcloud app deploy
PS: Most of examples are taken from google api docs but shortened for quick illustration.
No comments:
Post a Comment