Overview
The ESP32-S3 AI CAM is an intelligent camera module designed around the ESP32-S3 chip, tailored for video image processing and voice interaction. It is suitable for AI projects such as video surveillance, edge image recognition, and voice dialogue. The ESP32-S3 supports high-performance neural network computation and signal processing, equipping the device with powerful image recognition and voice interaction capabilities.
Powerful AI Processing, Intelligent Image and Voice Recognition
The ESP32-S3 AI CAM is powered by the ESP32-S3 chip, offering outstanding neural network computing capabilities. It can perform intelligent recognition on camera images and handle complex edge computing tasks. Additionally, the integration of a microphone and speaker enables support for voice recognition and dialogue, facilitating remote command control, real-time interaction, and various AI functions. This makes it suitable for IoT devices and smart surveillance applications.
Wide-Angle Infrared Camera for All-Weather Monitoring
The wide-angle infrared camera on the ESP32-S3 AI CAM, combined with infrared illumination and a light sensor, ensures excellent image clarity even in low light or complete darkness. Whether day or night, the ESP32-S3 AI CAM guarantees the stability and clarity of surveillance footage, providing reliable support for security and monitoring systems.
Voice Interaction and Control for Smart Automation
With a built-in microphone and speaker, the ESP32-S3 AI CAM enables voice recognition and dialogue functionality. In smart home and IoT devices, this feature allows users to control the camera or other devices through voice commands, simplifying operations and enhancing the intelligent experience.
Network Support for Expanding Online AI Functions
In addition to local AI processing capabilities, the ESP32-S3 AI CAM can connect to the internet via Wi-Fi, extending more smart features through the cloud or online large models. Through Wi-Fi connectivity, this module can interface with cloud AI platforms to call upon large-scale language and vision models, enabling more advanced tasks such as sophisticated image classification, voice translation, and natural language dialogue. This not only empowers local computation but also taps into online resources for remote intelligent operations, significantly enhancing its potential in IoT devices.
Features
- Various AI capabilities
- Edge image recognition (based on EdgeImpulse)
- Online image recognition (openCV, YOLO)
- Online large models for voice and image (ChatGPT)
- Equipped with a wide-angle night vision camera, infrared illumination, and all-day usability
- Onboard microphone and amplifier for voice interaction
- Offers a variety of AI models, with tutorial support for quick learning
Application Scenarios
- Surveillance cameras
- Electronic peephole
- AI assistant robots
- Time-lapse camera
Product Specifications
Basic Parameters
- Operating Voltage: 3.3V
- Type-C Input Voltage: 5V DC
- VIN Input Voltage: 5-12V DC
- Operating Temperature: -10~60°C
- Module Dimensions: 42*42mm
Camera Specifications
- Sensor Model: OV3660
- Pixel: 2 Megapixels
- Sensitivity: Visible light, 940nm infrared
- Field of View: 160°
- Focal Length: 0.95
- Aperture: 2.0
- Distortion: <8%
Hardware Information
- Processor: Xtensa® Dual-core 32-bit LX7 Microprocessor
- Main Frequency: 240 MHz
- SRAM: 512KB
- ROM: 384KB
- Flash: 16MB
- PSRAM: 8MB
- RTC SRAM: 16KB
- USB: USB 2.0 OTG Full-Speed Interface
Wi-Fi
- Wi-Fi Protocol: IEEE 802.11b/g/n
- Wi-Fi Bandwidth: 2.4 GHz band supports 20 MHz and 40 MHz bandwidth
- Wi-Fi Modes: Station Mode, SoftAP Mode, SoftAP+Station Mode, and Promiscuous Mode
- Wi-Fi Frequency: 2.4GHz
- Frame Aggregation: TX/RX A-MPDU, TX/RX A-MSDU
Bluetooth
- Bluetooth Protocol: Bluetooth 5, Bluetooth Mesh
- Bluetooth Data Rates: 125 Kbps, 500 Kbps, 1 Mbps, 2 Mbps
Function Indicator Diagram
Functional Overview
- OV3660: 160° wide-angle infrared camera
- IR: Infrared illumination (IO47)
- MIC: I2S PDM microphone
- LED: Onboard LED (IO3)
- ALS: LTR-308 ambient light sensor
- ESP32-S3: ESP32-S3R8 chip
- SD: SD card slot
- Flash: 16MB Flash
- VIN: 5-12V DC input
- HM6245: Power chip
- Type-C: USB Type-C interface for power and code upload
- Gravity:
- +: 3.3-5V
- -: GND
- 44: IO44, native TX on ESP32-S3
- 43: IO43, native RX on ESP32-S3
- RST: Reset button
- BOOT: BOOT button (IO0)
- SPK: MX1.25-2P speaker interface
- MAX98357: I2S amplifier chip
Pinout Diagram
Tutorial - First Time Use
Arduino IDE Configuration
Please pay attention to the followings when using ESP32-S3 AI CAM for the first time.
- Add the json link in the IDE
- Download the core of the MCU
- Select the development board and serial port
- Open the sample code and burn it into the board
- Get to know the serial monitor
Arduino IDE compiler environment config
- Configure URL to the Arduino IDE
- Open Arduino IDE and click File->Preferences, as shown below.

- In the newly opened interface, click the button in the red circle as shown below
Copy the following link into the new pop-up dialog box:
Stable version:https://espressif.github.io/arduino-esp32/package_esp32_index.json
Development release:https://espressif.github.io/arduino-esp32/package_esp32_dev_index.json
Note:
Please choose the appropriate version according to Chip Support Situation.
If you have installed another environment before, you can press Enter key at the beginning or end of the previous link and paste the link at a new line.
Click OK. Update the board. Open Tools->Board:->Boards Manager... as shown below:
Boards Manager will automatically update the boards as shown below:
After completing the update, you can enter esp32 at the top, select esp32 and click install when the following occurs (It's recommended to install the latest version):
Wait for the end of the following progress bar:
After completing the installation, the list will show that the esp32 has been installed, as shown below:
- Click Tools->Board, select "ESP32S3 Dev Module".
- Before starting, you need to configure the following settings (when you select Disabled, the serial port is RX, TX, if you need to print on the Arduino monitor via USB, you need to select Enable)
- Click Port to select the corresponding serial port.
5.2 LED Blinking
The default pin for the onboard LED is pin 3.
Sample Code
int led = 3;
void setup() {
pinMode(led,OUTPUT);
}
void loop() {
digitalWrite(led,HIGH);
delay(1000);
digitalWrite(led,LOW);
delay(1000);
}
- Copy the codes above to the code editing box.
- Click the arrow to complile the program and burn it into your development board.
Burning Successful
The image above shows that your codes have been successfully loaded into the board. Then, the onboard LED will start blinking.
ESP32 Tutorials
Video Transmission
Steps
- In the Arduino IDE, select File->Examples->ESP32->Camera->CameraWebServer example.
- Replace the code in CameraWebServer with the following code (Note: You need to fill in the WIFI account and password).
- Open the serial monitor to check the IP address.
- Use a device within the local area network to access the IP through a browser, and click "start" to view the surveillance screen.
#include "esp_camera.h"
#include <WiFi.h>
//
// WARNING!!! PSRAM IC required for UXGA resolution and high JPEG quality
// Ensure ESP32 Wrover Module or other board with PSRAM is selected
// Partial images will be transmitted if image exceeds buffer size
//
// You must select partition scheme from the board menu that has at least 3MB APP space.
// Face Recognition is DISABLED for ESP32 and ESP32-S2, because it takes up from 15
// seconds to process single frame. Face Detection is ENABLED if PSRAM is enabled as well
#define PWDN_GPIO_NUM -1
#define RESET_GPIO_NUM -1
#define XCLK_GPIO_NUM 5
#define Y9_GPIO_NUM 4
#define Y8_GPIO_NUM 6
#define Y7_GPIO_NUM 7
#define Y6_GPIO_NUM 14
#define Y5_GPIO_NUM 17
#define Y4_GPIO_NUM 21
#define Y3_GPIO_NUM 18
#define Y2_GPIO_NUM 16
#define VSYNC_GPIO_NUM 1
#define HREF_GPIO_NUM 2
#define PCLK_GPIO_NUM 15
#define SIOD_GPIO_NUM 8
#define SIOC_GPIO_NUM 9
// ===========================
// Enter your WiFi credentials
// ===========================
const char *ssid = "**********";
const char *password = "**********";
void startCameraServer();
void setupLedFlash(int pin);
void setup() {
Serial.begin(115200);
Serial.setDebugOutput(true);
Serial.println();
camera_config_t config;
config.ledc_channel = LEDC_CHANNEL_0;
config.ledc_timer = LEDC_TIMER_0;
config.pin_d0 = Y2_GPIO_NUM;
config.pin_d1 = Y3_GPIO_NUM;
config.pin_d2 = Y4_GPIO_NUM;
config.pin_d3 = Y5_GPIO_NUM;
config.pin_d4 = Y6_GPIO_NUM;
config.pin_d5 = Y7_GPIO_NUM;
config.pin_d6 = Y8_GPIO_NUM;
config.pin_d7 = Y9_GPIO_NUM;
config.pin_xclk = XCLK_GPIO_NUM;
config.pin_pclk = PCLK_GPIO_NUM;
config.pin_vsync = VSYNC_GPIO_NUM;
config.pin_href = HREF_GPIO_NUM;
config.pin_sccb_sda = SIOD_GPIO_NUM;
config.pin_sccb_scl = SIOC_GPIO_NUM;
config.pin_pwdn = PWDN_GPIO_NUM;
config.pin_reset = RESET_GPIO_NUM;
config.xclk_freq_hz = 20000000;
config.frame_size = FRAMESIZE_UXGA;
config.pixel_format = PIXFORMAT_JPEG; // for streaming
//config.pixel_format = PIXFORMAT_RGB565; // for face detection/recognition
config.grab_mode = CAMERA_GRAB_WHEN_EMPTY;
config.fb_location = CAMERA_FB_IN_PSRAM;
config.jpeg_quality = 12;
config.fb_count = 1;
// if PSRAM IC present, init with UXGA resolution and higher JPEG quality
// for larger pre-allocated frame buffer.
if (config.pixel_format == PIXFORMAT_JPEG) {
if (psramFound()) {
config.jpeg_quality = 10;
config.fb_count = 2;
config.grab_mode = CAMERA_GRAB_LATEST;
} else {
// Limit the frame size when PSRAM is not available
config.frame_size = FRAMESIZE_SVGA;
config.fb_location = CAMERA_FB_IN_DRAM;
}
} else {
// Best option for face detection/recognition
config.frame_size = FRAMESIZE_240X240;
#if CONFIG_IDF_TARGET_ESP32S3
config.fb_count = 2;
#endif
}
#if defined(CAMERA_MODEL_ESP_EYE)
pinMode(13, INPUT_PULLUP);
pinMode(14, INPUT_PULLUP);
#endif
// camera init
esp_err_t err = esp_camera_init(&config);
if (err != ESP_OK) {
Serial.printf("Camera init failed with error 0x%x", err);
return;
}
sensor_t *s = esp_camera_sensor_get();
// initial sensors are flipped vertically and colors are a bit saturated
if (s->id.PID == OV3660_PID) {
s->set_vflip(s, 1); // flip it back
s->set_brightness(s, 1); // up the brightness just a bit
s->set_saturation(s, -2); // lower the saturation
}
// drop down frame size for higher initial frame rate
if (config.pixel_format == PIXFORMAT_JPEG) {
s->set_framesize(s, FRAMESIZE_QVGA);
}
#if defined(CAMERA_MODEL_M5STACK_WIDE) || defined(CAMERA_MODEL_M5STACK_ESP32CAM)
s->set_vflip(s, 1);
s->set_hmirror(s, 1);
#endif
#if defined(CAMERA_MODEL_ESP32S3_EYE)
s->set_vflip(s, 1);
#endif
// Setup LED FLash if LED pin is defined in camera_pins.h
#if defined(LED_GPIO_NUM)
setupLedFlash(LED_GPIO_NUM);
#endif
WiFi.begin(ssid, password);
WiFi.setSleep(false);
Serial.print("WiFi connecting");
while (WiFi.status() != WL_CONNECTED) {
delay(500);
Serial.print(".");
}
Serial.println("");
Serial.println("WiFi connected");
startCameraServer();
Serial.print("Camera Ready! Use 'http://");
Serial.print(WiFi.localIP());
Serial.println("' to connect");
}
void loop() {
// Do nothing. Everything is done in another task by the web server
delay(10000);
}
Recording and Playing Audio
This example demonstrates the recording and playback functionality. After uploading the code and resetting the development board, the LED will light up to indicate a 5-second recording. Once the LED turns off, the recording will play back through the speaker.
#include <Arduino.h>
#include <SPI.h>
#include "ESP_I2S.h"
#define SAMPLE_RATE (16000)
#define DATA_PIN (GPIO_NUM_39)
#define CLOCK_PIN (GPIO_NUM_38)
#define REC_TIME 5 //Recording time 5 seconds
void setup()
{
uint8_t *wav_buffer;
size_t wav_size;
I2SClass i2s;
I2SClass i2s1;
Serial.begin(115200);
pinMode(3, OUTPUT);
pinMode(41, OUTPUT);
i2s.setPinsPdmRx(CLOCK_PIN, DATA_PIN);
if (!i2s.begin(I2S_MODE_PDM_RX, SAMPLE_RATE, I2S_DATA_BIT_WIDTH_16BIT, I2S_SLOT_MODE_MONO)) {
Serial.println("Failed to initialize I2S PDM RX");
}
i2s1.setPins(45, 46, 42);
if (!i2s1.begin(I2S_MODE_STD, SAMPLE_RATE, I2S_DATA_BIT_WIDTH_16BIT, I2S_SLOT_MODE_MONO)) {
Serial.println("MAX98357 initialization failed!");
}
Serial.println("start REC");
digitalWrite(3, HIGH);
wav_buffer = i2s.recordWAV(REC_TIME, &wav_size);
digitalWrite(3, LOW);
//Play the recording
i2s1.playWAV(wav_buffer, wav_size);
}
void loop()
{
}
OpenAI image Q&A
This example demonstrates the connection of the ESP32-S3 AI CAM to openAI for voice and image recognition. After flashing the code, press and hold the "boot" button and ask questions to the module, such as "What do you see?", "What kind of plant is this?", "What is he doing?". The module will send the audio and image data to openAI for recognition, and play the recognition results in the form of voice.
OpenAI RTC
Through this sample code, you can have a real-time conversation with OpenAI. Note: This code runs under ESP-IDF.
Object Contour Recognition Using OpenCV on PC
This example demonstrates object contour recognition using OpenCV on a PC.
Steps
- Upload the "CameraWeb" example code to the module and check the IP address.
- Install the Python environment (https://www.python.org/) (if you already have it installed, skip to the next step).
- Install the necessary Python libraries:
- Press Win+R, type "cmd" to enter the command window.
- Enter "pip install numpy" to install the numpy library.
- Enter "pip install opencv-python" to install the opencv-python library.
- Enter "pip install opencv-contrib-python" to install the opencv-contrib-python library.
- Modify the IP address in the fifth line of the
openCV-demo.py
to the IP address of your ESP32-S3. - Open Python's IDLE, go to File -> Open..., select
openCV-demo.py
, and press F5 in the new window to see the recognition screen and results.
Object Classification Using YOLOv5 on PC
This example demonstrates object classification using YOLOv5 on a PC.
Steps
- Upload the "CameraWeb" example code to the module and check the IP address.
- Install the Python environment (https://www.python.org/) (if you already have it installed, skip to the next step).
- Install the YOLOv5 library:
- Press Win+R, type "cmd" to enter the command window.
- Enter "pip install numpy" to install the numpy library.
- Enter "pip install yolov5" to install the YOLOv5 library.
- Modify the IP address in the eighth line of the
yoloV5-demo.py
to the IP address of your ESP32-S3. - Open Python's IDLE, go to File -> Open..., select
yoloV5-demo.py
, and press F5 in the new window to see the recognition screen and results.
Image Recognition Using EdgeImpulse
This example demonstrates how to implement image recognition on the module using EdgeImpulse.
Note: Different SDK versions may cause compilation errors (demo SDK version 3.0.1).
Steps
- Extract the trained model files to the "arduino->libraries" folder.
- Replace the "depthwise_conv.cpp" and "conv.cpp" files in "src\edge-impulse-sdk\tensorflow\lite\micro\kernels".
- Move the
edge_camera
folder and its subfolders into the examples folder of the model library. - Open the Arduino IDE, select the
edge_camera
example, modify the first line of the code to match the model library's .h file, enter the WiFi credentials, then compile and upload. - Open the serial monitor to see the IP address and recognition results. Access the IP to view the camera feed.
Training custom models with EdgeImpulse
Access HomeAssistant
Based on this tutorial, you can add the ESP32 - S3 camera module to HomeAssistant and view the monitoring footage.
Access the HomeAssistant tutorial
Retrieving Ambient Light Data
This example allows you to retrieve ambient light data.
Before using, please install the LTR308 Sensor Library
#include <DFRobot_LTR308.h>
DFRobot_LTR308 light;
void setup(){
Serial.begin(115200);
// If you are using a camera, please initialize the camera before initializing the LTR308. Otherwise, the read values will be 0.
while(!light.begin()){
Serial.println("Initialization failed!");
delay(1000);
}
Serial.println("Initialization successful!");
void loop(){
uint32_t data = light.getData();
Serial.print("Original Data: ");
Serial.println(data);
Serial.print("Lux Data: ");
Serial.println(light.getLux(data));
delay(500);
}