kantv

kantv

workbench for learing&practising AI tech in real scenario on Android device, powered by GGML(Georgi Gerganov Machine Learning) and NCNN(Tencent NCNN) and FFmpeg

Stars: 75

Visit
 screenshot

KanTV is an open-source project that focuses on studying and practicing state-of-the-art AI technology in real applications and scenarios, such as online TV playback, transcription, translation, and video/audio recording. It is derived from the original ijkplayer project and includes many enhancements and new features, including: * Watching online TV and local media using a customized FFmpeg 6.1. * Recording online TV to automatically generate videos. * Studying ASR (Automatic Speech Recognition) using whisper.cpp. * Studying LLM (Large Language Model) using llama.cpp. * Studying SD (Text to Image by Stable Diffusion) using stablediffusion.cpp. * Generating real-time English subtitles for English online TV using whisper.cpp. * Running/experiencing LLM on Xiaomi 14 using llama.cpp. * Setting up a customized playlist and using the software to watch the content for R&D activity. * Refactoring the UI to be closer to a real commercial Android application (currently only supports English). Some goals of this project are: * To provide a well-maintained "workbench" for ASR researchers interested in practicing state-of-the-art AI technology in real scenarios on mobile devices (currently focusing on Android). * To provide a well-maintained "workbench" for LLM researchers interested in practicing state-of-the-art AI technology in real scenarios on mobile devices (currently focusing on Android). * To create an Android "turn-key project" for AI experts/researchers (who may not be familiar with regular Android software development) to focus on device-side AI R&D activity, where part of the AI R&D activity (algorithm improvement, model training, model generation, algorithm validation, model validation, performance benchmark, etc.) can be done very easily using Android Studio IDE and a powerful Android phone.

README:

KanTV

KanTV("Kan", aka Chinese PinYin "Kan" or Chinese HanZi "看" or English "watch/listen") , an open source project focus on study and practise state-of-the-art AI technology in real scenario(such as online-TV playback and online-TV transcription(real-time subtitle) and online-TV language translation and online-TV video&audio recording works at the same time) on Android phone/device, derived from original ijkplayer , with much enhancements and new features:

  • Watch online TV and local media by my customized FFmpeg 6.1, source code of my customized FFmpeg 6.1 could be found in external/ffmpeg according to FFmpeg's license

  • Record online TV to automatically generate videos (useful for short video creators to generate short video materials but pls respect IPR of original content creator/provider); record online TV's video / audio content for gather video / audio data which might be required of/useful for AI R&D activity

  • AI subtitle(real-time English subtitle for English online-TV(aka OTT TV) by the great & excellent & amazing whisper.cpp ), pls attention Xiaomi 14 or other powerful Android mobile phone is HIGHLY required/recommended for AI subtitle feature otherwise unexpected behavior would happen

  • 2D graphic performance

  • Set up a customized playlist and then use this software to watch the content of the customized playlist for R&D activity

  • UI refactor(closer to real commercial Android application and only English is supported in UI language currently)

  • Well-maintained "workbench" for ASR(Automatic Speech Recognition) researchers/developers/programmers who was interested in practise state-of-the-art AI tech(such as whisper.cpp) in real scenario on Android phone/device(PoC: realtime AI subtitle for online-TV(aka OTT TV) on Xiaomi 14 finished from 03/05/2024 to 03/16/2024)

  • Well-maintained "workbench" for LLM(Large Language Model) researchers/developers who was interested in practise state-of-the-art AI tech(such as llama.cpp) in real scenario on Android phone/device, or Run/experience LLM model(such as llama-2-7b, baichuan2-7b, qwen1_5-1_8b, gemma-2b) on Android phone/device using the magic llama.cpp

  • Well-maintained "workbench" for GGML beginners to study internal mechanism of GGML inference framework on Android phone/device(PoC:Qualcomm QNN backend for ggml finished from 03/29/2024 to 04/26/2024)

  • Well-maintained "workbench" for NCNN beginners to study and practise NCNN inference framework on Android phone/device

  • Well-maintained turn-key / self-contained project for AI researchers(whom mightbe not familiar with regular Android software development)/developers/beginners focus on edge/device-side AI learning / R&D activity, some AI R&D activities (AI algorithm validation / AI model validation / performance benchmark in ASR, LLM, TTS, NLP, CV......field) could be done by Android Studio IDE + a powerful Android phone very easily

Software architecture of KanTV Android

(depend on https://github.com/zhouwg/kantv/issues/121)

kantv-software-arch

How to build project

Fetch source codes


git clone https://github.com/zhouwg/kantv.git

cd kantv

git checkout master

cd kantv

Setup development environment

Option 1: Setup docker environment
  • Build docker image

    docker build build -t kantv --build-arg USER_ID=$(id -u) --build-arg GROUP_ID=$(id -g) --build-arg USER_NAME=$(whoami)
  • Run docker container

    # map source code directory into docker container
    docker run -it --name=kantv --volume=`pwd`:/home/`whoami`/kantv kantv
    
    # in docker container
    . build/envsetup.sh
    
    ./build/prebuild-download.sh
Option 2: Setup local environment
  • Prerequisites
      Host OS information:
      
      uname -a
      
      Linux 5.8.0-43-generic #49~20.04.1-Ubuntu SMP Fri Feb 5 09:57:56 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
      
      cat /etc/issue
      
      Ubuntu 20.04.2 LTS \n \l
      
      
      • tools & utilities
      sudo apt-get update
      sudo apt-get install build-essential -y
      sudo apt-get install cmake -y
      sudo apt-get install curl -y
      sudo apt-get install wget -y
      sudo apt-get install python -y
      sudo apt-get install tcl expect -y
      sudo apt-get install nginx -y
      sudo apt-get install git -y
      sudo apt-get install vim -y
      sudo apt-get install spawn-fcgi -y
      sudo apt-get install u-boot-tools -y
      sudo apt-get install ffmpeg -y
      sudo apt-get install openssh-client -y
      sudo apt-get install nasm -y
      sudo apt-get install yasm -y
      sudo apt-get install openjdk-17-jdk -y
      
      sudo dpkg --add-architecture i386
      sudo apt-get install lib32z1 -y
      
      sudo apt-get install -y android-tools-adb android-tools-fastboot autoconf \
              automake bc bison build-essential ccache cscope curl device-tree-compiler \
              expect flex ftp-upload gdisk acpica-tools libattr1-dev libcap-dev \
              libfdt-dev libftdi-dev libglib2.0-dev libhidapi-dev libncurses5-dev \
              libpixman-1-dev libssl-dev libtool make \
              mtools netcat python-crypto python3-crypto python-pyelftools \
              python3-pycryptodome python3-pyelftools python3-serial \
              rsync unzip uuid-dev xdg-utils xterm xz-utils zlib1g-dev
      
      sudo apt-get install python3-pip -y
      sudo apt-get install indent -y
      pip3 install meson ninja
      
      echo "export PATH=/home/`whoami`/.local/bin:\$PATH" >> ~/.bashrc
      
      

      or run below script accordingly after fetch project's source code

      
      ./build/prebuild.sh
      
      
      

      borrow from http://ffmpeg.org/developer.html#Editor-configuration

      set ai
      set nu
      set expandtab
      set tabstop=4
      set shiftwidth=4
      set softtabstop=4
      set noundofile
      set nobackup
      set fileformat=unix
      set undodir=~/.undodir
      set cindent
      set cinoptions=(0
      " Allow tabs in Makefiles.
      autocmd FileType make,automake set noexpandtab shiftwidth=8 softtabstop=8
      " Trailing whitespace and tabs are forbidden, so highlight them.
      highlight ForbiddenWhitespace ctermbg=red guibg=red
      match ForbiddenWhitespace /\s\+$\|\t/
      " Do not highlight spaces at the end of line while typing on that line.
      autocmd InsertEnter * match ForbiddenWhitespace /\t\|\s\+\%#\@<!$/
      
      
  • Download android-ndk-r26c to prebuilts/toolchain, skip this step if android-ndk-r26c is already exist

    . build/envsetup.sh
    
    ./build/prebuild-download.sh
    
    
  • Modify ggml/CMakeLists.txt and ncnn/CMakeLists.txt accordingly if target Android device is Xiaomi 14 or Qualcomm Snapdragon 8 Gen 3 SoC based Android phone

  • Modify ggml/CMakeLists.txt and ncnn/CMakeLists.txt accordingly if target Android phone is Qualcomm SoC based Android phone and enable QNN backend for inference framework on Qualcomm SoC based Android phone

  • Remove the hardcoded debug flag in Android NDK android-ndk issue

    
    # open $ANDROID_NDK/build/cmake/android.toolchain.cmake for ndk < r23
    # or $ANDROID_NDK/build/cmake/android-legacy.toolchain.cmake for ndk >= r23
    # delete "-g" line
    list(APPEND ANDROID_COMPILER_FLAGS
    -g
    -DANDROID
    
    

Build native codes

. build/envsetup.sh

Screenshot from 2024-04-07 09-45-04

Build Android APK

  • Option 1: Build APK from source code by Android Studio IDE

  • Option 2: Build APK from source code by command line

      . build/envsetup.sh
      lunch 1
      ./build-all.sh android
    

Run Android APK on Android phone

This project is a learning&research project, so the Android APK will not collect/upload user data in Android device. The Android APK should be works well on any mainstream Android phone(report issue in various Android phone to this project is greatly welcomed) and the following four permissions are required:

  • Access to storage is required to generate necessary temporary files
  • Access to device information is required to obtain current phone network status information, distinguishing whether the current network is Wi-Fi or mobile when playing online TV
  • Access to camera is needed for AI Agent
  • Access to mic(audio recorder) is needed for AI Agent

here is a short video to demostrate AI subtitle by running the great & excellent & amazing whisper.cpp on a Xiaomi 14 device - fully offline, on-device.

https://github.com/zhouwg/kantv/assets/6889919/2fabcb24-c00b-4289-a06e-05b98ecd22b8


here is a screenshot to demostrate LLM inference by running the magic llama.cpp on a Xiaomi 14 device - fully offline, on-device.

1204894425


here is a screenshot to demostrate ASR inference by running the excellent whisper.cpp on a Xiaomi 14 device - fully offline, on-device.

840460054


here are some screenshots to demostrate CV inference by running the excellent ncnn on a Xiaomi 14 device - fully offline, on-device.

2015869763

988568755

1730654667

1301547964

some other screenshots

    784269893 205726588

    1714239572

    1778831978

    Screenshot_2024_0304_131033

    154248860

    1118975128 Screenshot_20240301_000609_com cdeos kantv

    1966093505

    1179733910

    2138671817

    1634808790

    991182277

Hot topics

  • improve the quality of Qualcomm QNN backend for GGML

  • improve the performance of edge-AI inference on Android phone

  • bugfix in UI layer(Java)

  • bugfix in native layer(C/C++)

Contribution

Be sure to review the opening issues before contribute to project KanTV, We use GitHub issues for tracking requests and bugs, please see how to submit issue in this project .

Report issue in various Android-based phone or even submit PR to this project is greatly welcomed.

Docs

Special Acknowledgement

License

Copyright (c) 2021 - 2023 Project KanTV

Copyright (c) 2024 -  Authors of Project KanTV

Licensed under Apachev2.0 or later

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for kantv

Similar Open Source Tools

For similar tasks

For similar jobs