TinyAgent: Function Calling at the Edge

Key Tech Innovations and Data Security for Today’s Businesses

The rise of Industry 4.0 marks a turning point for businesses. It blends artificial intelligence (AI), the Internet of Things (IoT), and data science to transform operations and company growth. These technologies let businesses make smarter and faster decisions and create efficient processes. However, as workplaces embrace remote models and interconnected systems, securing sensitive data … Read More

NVIDIA just made game physics a playground for everyone

NVIDIA has taken a major step in supporting the open-source community by fully releasing the source code of its PhysX and Flow GPU-accelerated libraries under the permissive BSD-3 license. While the CPU version of PhysX has been open-source since 2018, this latest release includes the long-awaited GPU simulation kernels, enabling developers to access over 500 … Read More

Hugging Face Releases SmolLM3: A 3B Long-Context, Multilingual Reasoning Model

Hugging Face just released SmolLM3, the latest version of its “Smol” language models, designed to deliver strong multilingual reasoning over long contexts using a compact 3B-parameter architecture. While most high-context capable models typically push beyond 7B parameters, SmolLM3 manages to offer state-of-the-art (SoTA) performance with significantly fewer parameters—making it more cost-efficient and deployable on constrained … Read More

Function Calling at the Edge – The Berkeley Artificial Intelligence Research Blog

The ability of LLMs to execute commands through plain language (e.g. English) has enabled agentic systems that can complete a user query by orchestrating the right set of tools (e.g. ToolFormer, Gorilla). This, along with the recent multi-modal efforts such as the GPT-4o or Gemini-1.5 model, has expanded the realm of possibilities with AI agents. While this is quite exciting, the large model size and computational requirements of these models often requires their inference to be performed on the cloud. This can create several challenges for their widespread adoption. First and foremost, uploading data such as video, audio, or text documents to a third party vendor on the cloud, can result in privacy issues. Second, this requires cloud/Wi-Fi connectivity which is not always possible. For instance, a robot deployed in the real world may not always have a stable connection. Besides that, latency could also be an issue as uploading large amounts of data to the cloud and waiting for the response could slow down response time, resulting in unacceptable time-to-solution. These challenges could be solved if we deploy the LLM models locally at the edge.

Accelerating scientific discovery with AI | MIT News

Several researchers have taken a broad view of scientific progress over the last 50 years and come to the same troubling conclusion: Scientific productivity is declining. It’s taking more time, more funding, and larger teams to make discoveries that once came faster and cheaper. Although a variety of explanations have been offered for the slowdown, … Read More

Exploring the “My First Robots” Kit: Empowering the Next Generation of Engineers

Get Your “My First Robot” Kit In today’s world, artificial intelligence and robotics are no longer just the stuff of science fiction. These technologies are shaping industries, education, and even how children learn and engage with technology. The “My First Robots” kit from Robot School serves as a perfect example of how AI and robotics … Read More

The Transformative Role of AI in Cybersecurity

2025 marks a pivotal moment in the integration of artificial intelligence (AI) and cybersecurity. Rapid advancements in AI are not only redefining industries; they are reshaping the cybersecurity landscape in profound ways. Through this evolution, I have noted three primary trends emerging that demand immediate attention from organizations: The amplification of security threats powered by … Read More

No rules, just vibes! What is vibe coding?

In February, OpenAI cofounder and former Tesla AI director Andrej Karpathy coined a phrase that quickly sparked fascination, debate, and even a small cultural shift in the world of software development: vibe coding. What began as just a post “There’s a new kind of coding I call ‘vibe coding,’ where you fully give in to … Read More

NVIDIA AI Released DiffusionRenderer: An AI Model for Editable, Photorealistic 3D Scenes from a Single Video

AI-powered video generation is improving at a breathtaking pace. In a short time, we’ve gone from blurry, incoherent clips to generated videos with stunning realism. Yet, for all this progress, a critical capability has been missing: control and Edits While generating a beautiful video is one thing, the ability to professionally and realistically edit it—to … Read More

The Visual Haystacks Benchmark! – The Berkeley Artificial Intelligence Research Blog

Humans excel at processing vast arrays of visual information, a skill that is crucial for achieving artificial general intelligence (AGI). Over the decades, AI researchers have developed Visual Question Answering (VQA) systems to interpret scenes within single images and answer related questions. While recent advancements in foundation models have significantly closed the gap between human and machine visual processing, conventional VQA has been restricted to reason about only single images at a time rather than whole collections of visual data.

This limitation poses challenges in more complex scenarios. Take, for example, the challenges of discerning patterns in collections of medical images, monitoring deforestation through satellite imagery, mapping urban changes using autonomous navigation data, analyzing thematic elements across large art collections, or understanding consumer behavior from retail surveillance footage. Each of these scenarios entails not only visual processing across hundreds or thousands of images but also necessitates cross-image processing of these findings. To address this gap, this project focuses on the “Multi-Image Question Answering” (MIQA) task, which exceeds the reach of traditional VQA systems.

Visual Haystacks: the first “visual-centric” Needle-In-A-Haystack (NIAH) benchmark designed to rigorously evaluate Large Multimodal Models (LMMs) in processing long-context visual information.

Categories