Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and πŸ”œ video, up to 5x faster than OpenAI CLIP and LLaVA πŸ–ΌοΈ & πŸ–‹οΈ


