Voice Activated Home Control System
Project Design Proposal
By Kyle Joseph
Advisor: Dr. Alexander Malinowski
Assisted by: Dr. Don Schertz
The scope of this project is to create a voice-activated system that remotely controls electronic appliances in a home. The system utilizes a voice recognition circuit, known as the Voice Direct II. The Voice Direct II is interfaced to the EMAC 8051-microcontroller. The EMAC 8051 microcontroller has 3 functions: it contains the user-interface, it coordinates the learning of voice commands and IR signals and correlates them together in the external memory, and it outputs programmed IR signals when a proper command is detected by the Voice Direct II.
Detailed Project Description
The hardware is comprised of the MicroPac Emac evaluation board, the Voice Direct II and the IR circuitry. The Emac board contains the 8051 microcontroller, keypad and the LCD screen. In conjunction with the Emac board and the Voice Direct II the system is to learn one-phrase commands and then recognize them when they are spoken by the same speaker. The IR circuitry contains two blocks. The receive block receives IR commands (which are modulated using amplitude shift keying) and outputs TTL level bits. The transmit block outputs modulated IR commands (which are inputted to this block by the Emac board.) Refer to Fig 1 for an overall bock diagram of the system and Fig 2 for a subsystems diagram.
Figure 1Block Diagram Explanation: The system will operate as pictured with three inputs and three outputs.
Figure 2 Subsystems Block Diagram: The 8 subsystems of the VACHS project. Note the direction of the flow of signals/information.
Please refer to figure 2 to identify all subsystems discussed.
Subsystem: EMAC DEVLOPMENT BOARD
This hardware package contains the LCD and keyboard. The microprocessor is also embedded on this board. We will take a systems approach during the discussion of this subsystem. This means we will not concern ourselves with the actual signals being passed between the keyboard/LCD and the processor. The LCD and keyboard all communicate with the microprocessor via a pre-programmed signal processing function. The first real input to this subsystem is the IR TTL level pulses received from the IR Receiver. The second input is from the voice activation chip. The Voice Direct II signals are transmitted to the EMAC board by way of serial communication. These signals alert the microprocessor to a received command word, causing it to output IR pulses accordingly.
The first output of the EMAC board is TTL level pulses to the IR transmitter. IR codes are stored in memory located on the EMAC board. When this subsystem gets a command from the Voice Direct II to transmit the IR codes, TTL pulses (representing the IR codes) will be sent to the IR transmitter.
The second output of the EMAC board is TTL level communication to the Sensory Voice Direct II. These signals will request the Voice Direct II to learn a new word, listen for a previously learned word, and to replay known words.
Subsystem: IR Receiver
The IR receiver has two purposes. The first task is the retrieval of IR signals passed to it via an IR transmitter (from a remote control.) These signals are made up of photons. The second task of this subsystem is to demodulate the received IR signal. This means that the receiver will transform a 38 kHz sine wave into a low state output and the absence of that sine wave into a high state output. Since the sine wave will vary in frequency (dependant on manufacturer of the remote control) it may be necessary to include 3 or 4 different receivers to demodulate the different frequencies.
Subsystem: IR Transmitter
The IR transmitter will output modulated IR codes. These IR codes will be in the form of photons and electromagnetic waves. The input of this system is received from the EMAC board. The input signal will be in the form of TTL level pulses.
Subsystem: Sensory Voice Direct II Voice Activation Chip
The input of this subsystem comes from a microphone; the milli-volt level signal (from the microphone) is in the form of a voltage waveform representing a spoken word. This subsystem will decode the signal and compare it with previously recorded words stored as signals. It could also store the word (in its binary form) in its memory to be compared with later received words (again in converted into binary form). The next input to this subsystem is from the EMAC board. These inputs command this subsystem to do certain tasks, refer to the explanation on the EMAC board for a description of these tasks.
The first output of this subsystem is a spoken word stored as a signal in memory which will be broadcast on the speaker. The chip has the capability of playing back the commands that were stored into its memory. Therefore, the electrical output is an analog signal sent to the speaker. The second output of this subsystem is to the EMAC board. This output is a serial TTL link that will alert the EMAC of a known word being received.
The input of this subsystem is audio waves. These waves need to be of spoken words to be correct signals for the overall system. This subsystem transforms these audio waves into voltage waves which are outputted and received by the Voice Direct II.
The input of this subsystem is an analog voltage wave that carriers a spoken word. The output of this subsystem is a spoken word in the form of sound waves.
Figure 3 Software Flowchart: Logic of comparing, Learning, and Transmitting Voice and IR Signals
The software flowchart shown in figure 3 briefly describes the layout for the user interface. The system operates in a continuous voice recognition setup. This means it is constantly searching for a voice command input. The LCD displays a menu allowing the user to learn a command; the user selects this option from the user interface by using the keypad on the EMAC board. If the Voice Direct II detects an individual talking at a normal volume it begins recording. The subsystem continues recording for duration of 2.5 seconds. The recorded command is then compared with previously recorded voice commands saved in the external memory. If the command matches any of the stored commands the EMAC will recall the corresponding IR command and output it. If the signal does not match any of the stored voice commands, an error message is displayed and the user is asked to rerecord the command. If the inputted signal does match, the EMAC notifies the user their command has been identified via the LCD, and the stored voice command is played back through a speaker attached to the Voice Direct II.
If the Learn button is pressed at any time the system enters learn mode. This begins by the LCD notifying the user it is in Learn mode. A beep is output through the speaker of the Voice Direct II to tell the user to begin speaking. A period of 2.5 seconds will be recorded then stored in the memory of the Voice Direct II. Next the user is told to transmit the first IR signal to the EMAC board. This is also done at the sound of a beep, the user holds down the desired command for approximately 3 seconds for the IR receiver to capture, and the EMAC to record the signal. The signal is stored by the EMAC in the external memory with a pointer to the stored voice command. The user is then asked via the LCD if he/she wishes to record more IR commands. If yes the process repeats, otherwise a message is displayed stating that the item has been saved and the system returns to the startup menu.
If time permits a full menu will be added, which will allow the user to view all voice commands/IR signals by their itemized storage location. The user will be able to delete and move items and play the voice command over the speaker.
Standards and Patents
A patent search was done at the United States Patent webpage;
a device almost identical was found. If
we were to sell this we would need to contact Mr. William Stuart Bush of
An abstract of his system is included
A wireless, programmable, sound-activated and voice-operated remote control transmitter can be used to add hands-free speech control operation to a plurality of remotely controlled appliances manufactured by various manufacturers, each of which is normally controlled with one or more signals from an associated remote control transmitter. The system may be pre-programmed with a universal library of codes for controlling various appliance categories and appliances produced by various manufacturers within each category. The system may also be programmed using the controlled appliances' remote control transmitters and one or more operators' spoken commands. Once programming is complete, there is no need for the operator to manually operate the system, allowing true hands-free voice control of the remotely controlled products. Voice commands are organized into a plurality of linked recognition vocabulary sets, each representing a subset of the complete voice command vocabulary available. These subsets are structured in a fashion that is intuitive to the user because the structure is consistent with controlled appliance operation. As such, the system allows a user to easily navigate via voice commands between recognition sets to attain access to the intended voice commands.
(TBA = To Be Announced)
Voice Direct II operates with a voltage supply from 3.3 V up to 9 V
Internal circuitry operates at 3.3 volts drawing up to 100mA while operating
Power dissipation = .33 W
50 mW each
Total power to operate VAHCS: TBA
Power supply voltage: TBA
Frequency = 34KHz, 38KHz, 40KHz (this is the frequency of the IR carrier and should not to be confused with the actual frequency of the IR light itself). The actual wave-length of infra-red light is 750 nm.
The infra Red receiving range is tentatively set at no father than 1 foot
-This is due to the fact that the user will need to be close to the VAHCS to operate the learning mode
Voltage: TTL level
Current Drawn: TBA
Power required: TBA
The infra Red transmitting range is tentatively set at a maximum of 15 feet with a 45 degree operating range
The frequency range of the modulated output IR signal is between 34 KHz to 40 KHz
The system will store up to 60 commands voice commands, each lasting 2.5 seconds.
The speed that this system can translate bits
Minimum of 400 bps
The voice range is tentatively set at a maximum of 10 feet.
The device will use a 2 row LCD with as of now an unknown number of characters per row, as a visual interface with the user.
Maximum number of IR commands to be stored
The system will utilize a pointer based storage system. Each voice command will be assigned a pointer. This pointer will reference a space in memory. This will ensure all available memory will be used to store desired IR commands.
Emac onboard processor speed
Schedule of tasks
Voice Direct II Voice kit available at www.digikey.com
MicroPac Evaluation (Emac) Board available at Bradley
MIR27E: IR transmitter/modulator available at www.mrrobot.com
TSOP-12(carrier frequency): IR receivers/demodulator (3 different carrier frequencies) available at www.vishay.com