ESP32-S3 AI CAM Wiki - DFRobot

Overview

The ESP32-S3 AI CAM is an intelligent camera module designed around the ESP32-S3 chip, tailored for video image processing and voice interaction. It is suitable for AI projects such as video surveillance, edge image recognition, and voice dialogue. The ESP32-S3 supports high-performance neural network computation and signal processing, equipping the device with powerful image recognition and voice interaction capabilities.

Powerful AI Processing, Intelligent Image and Voice Recognition
The ESP32-S3 AI CAM is powered by the ESP32-S3 chip, offering outstanding neural network computing capabilities. It can perform intelligent recognition on camera images and handle complex edge computing tasks. Additionally, the integration of a microphone and speaker enables support for voice recognition and dialogue, facilitating remote command control, real-time interaction, and various AI functions. This makes it suitable for IoT devices and smart surveillance applications.

Wide-Angle Infrared Camera for All-Weather Monitoring
The wide-angle infrared camera on the ESP32-S3 AI CAM, combined with infrared illumination and a light sensor, ensures excellent image clarity even in low light or complete darkness. Whether day or night, the ESP32-S3 AI CAM guarantees the stability and clarity of surveillance footage, providing reliable support for security and monitoring systems.

Voice Interaction and Control for Smart Automation
With a built-in microphone and speaker, the ESP32-S3 AI CAM enables voice recognition and dialogue functionality. In smart home and IoT devices, this feature allows users to control the camera or other devices through voice commands, simplifying operations and enhancing the intelligent experience.

Network Support for Expanding Online AI Functions
In addition to local AI processing capabilities, the ESP32-S3 AI CAM can connect to the internet via Wi-Fi, extending more smart features through the cloud or online large models. Through Wi-Fi connectivity, this module can interface with cloud AI platforms to call upon large-scale language and vision models, enabling more advanced tasks such as sophisticated image classification, voice translation, and natural language dialogue. This not only empowers local computation but also taps into online resources for remote intelligent operations, significantly enhancing its potential in IoT devices.

Features

Application Scenarios

Product Specifications

Basic Parameters

Camera Specifications

Hardware Information

Wi-Fi
Bluetooth

Function Indicator Diagram

Functional Overview

Pinout Diagram

Tutorial - First Time Use

Arduino IDE Configuration

Please pay attention to the followings when using ESP32-S3 AI CAM for the first time.

  1. Add the json link in the IDE
  2. Download the core of the MCU
  3. Select the development board and serial port
  4. Open the sample code and burn it into the board
  5. Get to know the serial monitor

Arduino IDE compiler environment config

  1. Open Arduino IDE and click File->Preferences, as shown below.
  1. In the newly opened interface, click the button in the red circle as shown below

  1. Copy the following link into the new pop-up dialog box:

    Stable version:https://espressif.github.io/arduino-esp32/package_esp32_index.json
    Development release:https://espressif.github.io/arduino-esp32/package_esp32_dev_index.json

Note:

  1. Click OK. Update the board. Open Tools->Board:->Boards Manager... as shown below:

  2. Boards Manager will automatically update the boards as shown below:

  3. After completing the update, you can enter esp32 at the top, select esp32 and click install when the following occurs (It's recommended to install the latest version):

  4. Wait for the end of the following progress bar:

  5. After completing the installation, the list will show that the esp32 has been installed, as shown below:

  1. Click Tools->Board, select "ESP32S3 Dev Module".
  1. Before starting, you need to configure the following settings (when you select Disabled, the serial port is RX, TX, if you need to print on the Arduino monitor via USB, you need to select Enable)

  1. Click Port to select the corresponding serial port.

5.2 LED Blinking

The default pin for the onboard LED is pin 3.

Sample Code

int led = 3;
void setup() {
  pinMode(led,OUTPUT);
}

void loop() {
  digitalWrite(led,HIGH);
  delay(1000);
  digitalWrite(led,LOW);
  delay(1000);
}

Burning Successful

The image above shows that your codes have been successfully loaded into the board. Then, the onboard LED will start blinking.

ESP32 Tutorials

Video Transmission

Steps

  1. In the Arduino IDE, select File->Examples->ESP32->Camera->CameraWebServer example.
  2. Replace the code in CameraWebServer with the following code (Note: You need to fill in the WIFI account and password).
  3. Open the serial monitor to check the IP address.
  4. Use a device within the local area network to access the IP through a browser, and click "start" to view the surveillance screen.
#include "esp_camera.h"
#include <WiFi.h>

//
// WARNING!!! PSRAM IC required for UXGA resolution and high JPEG quality
//            Ensure ESP32 Wrover Module or other board with PSRAM is selected
//            Partial images will be transmitted if image exceeds buffer size
//
//            You must select partition scheme from the board menu that has at least 3MB APP space.
//            Face Recognition is DISABLED for ESP32 and ESP32-S2, because it takes up from 15
//            seconds to process single frame. Face Detection is ENABLED if PSRAM is enabled as well

#define PWDN_GPIO_NUM     -1
#define RESET_GPIO_NUM    -1
#define XCLK_GPIO_NUM     5
#define Y9_GPIO_NUM       4
#define Y8_GPIO_NUM       6
#define Y7_GPIO_NUM       7
#define Y6_GPIO_NUM       14
#define Y5_GPIO_NUM       17
#define Y4_GPIO_NUM       21
#define Y3_GPIO_NUM       18
#define Y2_GPIO_NUM       16
#define VSYNC_GPIO_NUM    1
#define HREF_GPIO_NUM     2
#define PCLK_GPIO_NUM     15
#define SIOD_GPIO_NUM  8
#define SIOC_GPIO_NUM  9

// ===========================
// Enter your WiFi credentials
// ===========================
const char *ssid = "**********";
const char *password = "**********";

void startCameraServer();
void setupLedFlash(int pin);

void setup() {
  Serial.begin(115200);
  Serial.setDebugOutput(true);
  Serial.println();

  camera_config_t config;
  config.ledc_channel = LEDC_CHANNEL_0;
  config.ledc_timer = LEDC_TIMER_0;
  config.pin_d0 = Y2_GPIO_NUM;
  config.pin_d1 = Y3_GPIO_NUM;
  config.pin_d2 = Y4_GPIO_NUM;
  config.pin_d3 = Y5_GPIO_NUM;
  config.pin_d4 = Y6_GPIO_NUM;
  config.pin_d5 = Y7_GPIO_NUM;
  config.pin_d6 = Y8_GPIO_NUM;
  config.pin_d7 = Y9_GPIO_NUM;
  config.pin_xclk = XCLK_GPIO_NUM;
  config.pin_pclk = PCLK_GPIO_NUM;
  config.pin_vsync = VSYNC_GPIO_NUM;
  config.pin_href = HREF_GPIO_NUM;
  config.pin_sccb_sda = SIOD_GPIO_NUM;
  config.pin_sccb_scl = SIOC_GPIO_NUM;
  config.pin_pwdn = PWDN_GPIO_NUM;
  config.pin_reset = RESET_GPIO_NUM;
  config.xclk_freq_hz = 20000000;
  config.frame_size = FRAMESIZE_UXGA;
  config.pixel_format = PIXFORMAT_JPEG;  // for streaming
  //config.pixel_format = PIXFORMAT_RGB565; // for face detection/recognition
  config.grab_mode = CAMERA_GRAB_WHEN_EMPTY;
  config.fb_location = CAMERA_FB_IN_PSRAM;
  config.jpeg_quality = 12;
  config.fb_count = 1;

  // if PSRAM IC present, init with UXGA resolution and higher JPEG quality
  //                      for larger pre-allocated frame buffer.
  if (config.pixel_format == PIXFORMAT_JPEG) {
    if (psramFound()) {
      config.jpeg_quality = 10;
      config.fb_count = 2;
      config.grab_mode = CAMERA_GRAB_LATEST;
    } else {
      // Limit the frame size when PSRAM is not available
      config.frame_size = FRAMESIZE_SVGA;
      config.fb_location = CAMERA_FB_IN_DRAM;
    }
  } else {
    // Best option for face detection/recognition
    config.frame_size = FRAMESIZE_240X240;
#if CONFIG_IDF_TARGET_ESP32S3
    config.fb_count = 2;
#endif
  }

#if defined(CAMERA_MODEL_ESP_EYE)
  pinMode(13, INPUT_PULLUP);
  pinMode(14, INPUT_PULLUP);
#endif

  // camera init
  esp_err_t err = esp_camera_init(&config);
  if (err != ESP_OK) {
    Serial.printf("Camera init failed with error 0x%x", err);
    return;
  }

  sensor_t *s = esp_camera_sensor_get();
  // initial sensors are flipped vertically and colors are a bit saturated
  if (s->id.PID == OV3660_PID) {
    s->set_vflip(s, 1);        // flip it back
    s->set_brightness(s, 1);   // up the brightness just a bit
    s->set_saturation(s, -2);  // lower the saturation
  }
  // drop down frame size for higher initial frame rate
  if (config.pixel_format == PIXFORMAT_JPEG) {
    s->set_framesize(s, FRAMESIZE_QVGA);
  }

#if defined(CAMERA_MODEL_M5STACK_WIDE) || defined(CAMERA_MODEL_M5STACK_ESP32CAM)
  s->set_vflip(s, 1);
  s->set_hmirror(s, 1);
#endif

#if defined(CAMERA_MODEL_ESP32S3_EYE)
  s->set_vflip(s, 1);
#endif

// Setup LED FLash if LED pin is defined in camera_pins.h
#if defined(LED_GPIO_NUM)
  setupLedFlash(LED_GPIO_NUM);
#endif

  WiFi.begin(ssid, password);
  WiFi.setSleep(false);

  Serial.print("WiFi connecting");
  while (WiFi.status() != WL_CONNECTED) {
    delay(500);
    Serial.print(".");
  }
  Serial.println("");
  Serial.println("WiFi connected");

  startCameraServer();

  Serial.print("Camera Ready! Use 'http://");
  Serial.print(WiFi.localIP());
  Serial.println("' to connect");
}

void loop() {
  // Do nothing. Everything is done in another task by the web server
  delay(10000);
}

Recording and Playing Audio

This example demonstrates the recording and playback functionality. After uploading the code and resetting the development board, the LED will light up to indicate a 5-second recording. Once the LED turns off, the recording will play back through the speaker.

#include <Arduino.h>
#include <SPI.h>

#include "ESP_I2S.h"

#define SAMPLE_RATE     (16000)
#define DATA_PIN        (GPIO_NUM_39)
#define CLOCK_PIN       (GPIO_NUM_38)
#define REC_TIME 5  //Recording time 5 seconds

void setup()
{
  uint8_t *wav_buffer;
  size_t wav_size;
  I2SClass i2s;
  I2SClass i2s1;
  Serial.begin(115200);
  pinMode(3, OUTPUT);
  pinMode(41, OUTPUT);
  i2s.setPinsPdmRx(CLOCK_PIN, DATA_PIN);
  if (!i2s.begin(I2S_MODE_PDM_RX, SAMPLE_RATE, I2S_DATA_BIT_WIDTH_16BIT, I2S_SLOT_MODE_MONO)) {
    Serial.println("Failed to initialize I2S PDM RX");
  }
  i2s1.setPins(45, 46, 42);
  if (!i2s1.begin(I2S_MODE_STD, SAMPLE_RATE, I2S_DATA_BIT_WIDTH_16BIT, I2S_SLOT_MODE_MONO)) {
    Serial.println("MAX98357 initialization failed!");
  }
  Serial.println("start REC");
  digitalWrite(3, HIGH);
  wav_buffer = i2s.recordWAV(REC_TIME, &wav_size);
  digitalWrite(3, LOW);
  //Play the recording
  i2s1.playWAV(wav_buffer, wav_size);
}

void loop()
{

}

OpenAI image Q&A

This example demonstrates the connection of the ESP32-S3 AI CAM to openAI for voice and image recognition. After flashing the code, press and hold the "boot" button and ask questions to the module, such as "What do you see?", "What kind of plant is this?", "What is he doing?". The module will send the audio and image data to openAI for recognition, and play the recognition results in the form of voice.

OpenAI RTC

Through this sample code, you can have a real-time conversation with OpenAI. Note: This code runs under ESP-IDF.

Object Contour Recognition Using OpenCV on PC

This example demonstrates object contour recognition using OpenCV on a PC.

Steps

  1. Upload the "CameraWeb" example code to the module and check the IP address.
  2. Install the Python environment (https://www.python.org/) (if you already have it installed, skip to the next step).
  3. Install the necessary Python libraries:
    • Press Win+R, type "cmd" to enter the command window.
    • Enter "pip install numpy" to install the numpy library.
    • Enter "pip install opencv-python" to install the opencv-python library.
    • Enter "pip install opencv-contrib-python" to install the opencv-contrib-python library.
  4. Modify the IP address in the fifth line of the openCV-demo.py to the IP address of your ESP32-S3.
  5. Open Python's IDLE, go to File -> Open..., select openCV-demo.py, and press F5 in the new window to see the recognition screen and results.

Object Classification Using YOLOv5 on PC

This example demonstrates object classification using YOLOv5 on a PC.

Steps

  1. Upload the "CameraWeb" example code to the module and check the IP address.
  2. Install the Python environment (https://www.python.org/) (if you already have it installed, skip to the next step).
  3. Install the YOLOv5 library:
    • Press Win+R, type "cmd" to enter the command window.
    • Enter "pip install numpy" to install the numpy library.
    • Enter "pip install yolov5" to install the YOLOv5 library.
  4. Modify the IP address in the eighth line of the yoloV5-demo.py to the IP address of your ESP32-S3.
  5. Open Python's IDLE, go to File -> Open..., select yoloV5-demo.py, and press F5 in the new window to see the recognition screen and results.

Image Recognition Using EdgeImpulse

This example demonstrates how to implement image recognition on the module using EdgeImpulse.

Note: Different SDK versions may cause compilation errors (demo SDK version 3.0.1).

Steps

  1. Extract the trained model files to the "arduino->libraries" folder.
  2. Replace the "depthwise_conv.cpp" and "conv.cpp" files in "src\edge-impulse-sdk\tensorflow\lite\micro\kernels".
  3. Move the edge_camera folder and its subfolders into the examples folder of the model library.
  4. Open the Arduino IDE, select the edge_camera example, modify the first line of the code to match the model library's .h file, enter the WiFi credentials, then compile and upload.
  5. Open the serial monitor to see the IP address and recognition results. Access the IP to view the camera feed.

Training custom models with EdgeImpulse

EdgeImpulse Object Detection

Access HomeAssistant

Based on this tutorial, you can add the ESP32 - S3 camera module to HomeAssistant and view the monitoring footage.

Access the HomeAssistant tutorial

Retrieving Ambient Light Data

This example allows you to retrieve ambient light data.
Before using, please install the LTR308 Sensor Library

#include <DFRobot_LTR308.h>

DFRobot_LTR308 light;

void setup(){
    Serial.begin(115200);
    // If you are using a camera, please initialize the camera before initializing the LTR308. Otherwise, the read values will be 0.
    while(!light.begin()){
    Serial.println("Initialization failed!");
    delay(1000);
  }
  Serial.println("Initialization successful!");

void loop(){
  uint32_t data = light.getData();
  Serial.print("Original Data: ");
  Serial.println(data);
  Serial.print("Lux Data: ");
  Serial.println(light.getLux(data));
  delay(500);
}