Contact Us menu-bars menu-close

Gesture-Based Machine Learning


Tarulata Champawat

April 28, 2020

As per a survey by the World Health Organization (WHO), 466 million people across the world face difficulty while they hear, speak and convey their message, which is over 5% of the world’s population. They don’t use voice as a primary medium to send their message and instead explain it through gestures. With gbML, the same gestures get captured in a computer, which then translates them into text and speech for ordinary people to understand and make sense out of it.

(gbML is a term designed by our team for the innovative idea, do not search the name on the internet!)

Idea: Only a few people understand the sign language used by the partially or fully disabled, and it becomes difficult for them to convey their message to everyone. gbML efficiently reduces the gap between the two.

How it works:

The differently-abled person performs gestures in front of the machine, and the computer recognizes those gestures and converts them into a native language in the form of text and audio. The output, text, or audio, is used to send a request to Alexa, and the Alexa API then responds to the system by using its skills and abilities. In this way, the person facing difficulty in speaking also interacts with Smart Assistant devices using Sign gestures.

Awake, Immediate, and Terminal words: The sentence is the soul for any conversation. As we make a conversation, the system requires the words to make a sentence from it.

 For example:

Alexa, what is the time?

Here ”Alexa” is Awake word

”What” is intermediate word and 

”Time” is the terminal word

The Awake word is to awake our system and tell it to start predicting the gestures. There is only one awake word in the system, the same as “Hey Siri,” Okay, Google,” “Alexa.” 

Intermediate words are the words that come in between the Awake words and the Terminal words.

Final words terminate the sentence and inform the system that the sentence has ended. 

The system gets trained for all three categories of words. Training data for each word gets feeded in the training data set so that the system instantly recognizes the signal and matches it against the supplied training data set and produces an output. 

In a nutshell, whenever a differently-abled person performs gestures in front of the camera, the system processes that gesture and performs Feature Extraction and Classification and converts those gestures into text and audio.

Technology: Tensorflow js and Knn Algorithms for gesture processing, feature extraction, and classification

 Use Cases:

  • At front desk In the Hospitality industry and other customer desk services
  • Government agencies for identifying words spoken by differently-abled
  • Admission in hospitals
  • All inquiry desks
  • Integration with social media
  • Workspaces like offices, hospitals, and banks

Conclusion: gbML is a Machine Learning application with a wide range of applications in education, business, and healthcare. It is a blessing for people with disability in hearing and speaking. With gbML in place, they no longer find it challenging to convey their message to the attendee. It is easy to use, understand, and deliver accurate results. The application is yet to undergo a lot of upgrades and is certain to have much more extensive use and applications in the future.

Watch the demo video here

Get updates. Sign up for our newsletter.


Let's explore how we can create WOW for you!