#072 – Chris Gully and the rise of Small Language Models

Chris Gully discusses his current role in the new Broadcom organization and highlights of his career. He emphasizes the importance of staying relevant in the technology industry and the value of working with cool and smart people. The conversation then shifts to the topic of small language models (SLMs) and their role in the landscape of gen AI applications. Gully explains that SLMs offer a more progressive approach to working with large language models (LLMs) and enable more efficient and scalable deployments. The discussion also touches on the components of gen AI applications, the need for right-sizing models, and the challenges of scalability and efficiency. Gully highlights the importance of data and its role in driving business outcomes through AI. The conversation concludes with a discussion on the benefits and limitations of fine-tuning LLMs and the potential future of SLMs. The conversation explores the concept of SLMs (Small Language Models) and their role in AI development. It discusses the advantages of SLMs over LLMs (Large Language Models) regarding efficiency, optimization, and governance. The conversation also touches on the challenges of infrastructure management and resource allocation in AI deployments. It highlights the importance of right-sizing workloads, distributing workloads across data centers, and maximizing resource utilization. The conversation concludes with a discussion on the future trends in machine learning and AI, including advancements in math and the need for accessible and efficient technology.

Links

Takeaways
Staying relevant in the technology industry is crucial for career success.

  • Small language models (SLMs) offer a more efficient and scalable approach to working with large language models (LLMs).
  • Data is the most important and untapped asset for organizations, and leveraging it through AI can drive business outcomes.
  • Scalability and efficiency are key challenges in deploying gen AI applications.
  • Fine-tuning LLMs can enhance their precision and reduce the need for extensive training.
  • The future of SLMs may involve dynamic training and efficient distribution to support evolving business needs. SLMs offer advantages in terms of efficiency, optimization, and governance compared to LLMs.
  • Infrastructure management and resource allocation are crucial in AI deployments.
  • Right-sizing workloads and maximizing resource utilization are key considerations.
  • Future trends in machine learning and AI include advancements in math and the need for accessible and efficient technology.

#071 – Developer Experience & Spring with DaShaun Carter

In this new episode of the Unexplored Territory Podcast, DaShaun Carter, a Spring Developer Advocate at VMware Tanzu and Broadcom, discusses his career highlights, his home lab setup, and his passion for Spring. He explains the concept of developer experience and how Spring and Tanzu contribute to it. DaShaun also highlights the innovations in Spring, such as AOT processing and native images, and their impact on use cases. He discusses the relationship between the open source aspect of Spring and the closed source solutions in the Tanzu portfolio. Finally, he explores the importance of developer experience in platform engineering. In this conversation, DaShaun and Johan discuss the importance of collaboration between developers and platform engineers, the value of Spring for platform engineers, the role of AI in developer experience and Spring, interesting topics for the VMware Explore conference, and where to learn more about Spring and open source.

Takeaways

  • Spring and Tanzu provide an easy and efficient developer experience, allowing developers to focus on solving problems and delivering software.
  • Innovations in Spring, such as AOT processing and native images, enable the deployment of enterprise-grade workloads on low-cost devices and at scale.
  • The open source aspect of Spring allows flexibility and choice for customers, while the commercial solutions in the Tanzu portfolio provide additional support and 24/7 access to experts.
  • Developer experience plays a crucial role in platform engineering, as it attracts developers to the platform and enables efficient onboarding and deployment processes. Collaboration between developers and platform engineers is crucial for successful software delivery.
  • Platform teams should build relationships with developers and continuously iterate on meeting their needs.
  • Spring provides enterprise-grade, production-ready tools and frameworks that make the life of platform engineers easier.
  • AI is becoming increasingly important in the developer experience, and Spring AI provides an abstraction layer for consuming AI models.
  • Interesting topics for the VMware Explorer conference include overcoming obstacles in software delivery, cost-saving solutions, and success stories.
  • To learn more about Spring and open source, connect with DaShaun on X, YouTube, and LinkedIn, and check out the Spring Office Hours show.

#070 – vSAN performance with Patryk Wolsza (Intel)

In this conversation, Duncan and Patryk discuss vSAN performance, specifically focusing on vSAN ESA. Patryk shares his findings from comparing Intel and AMD CPUs, highlighting that vSAN ESA performs better on Intel CPUs in almost every scenario. They also discuss the cost and price point considerations when choosing between VMware vSAN OSA and vSAN ESA. Patryk explains the configuration and testing process for OSA and ESA, as well as the performance impact of RDMA and 100 Gig NICs. The conversation concludes with recommendations for customers, emphasizing the importance of trying new technologies and exploring the benefits they can offer.

Takeaways

  • vSAN ESA performs better on Intel CPUs compared to AMD CPUs in various scenarios.
  • Consider the cost and price point when choosing between OSA and ESA.
  • RDMA and 100 Gig NICs can significantly improve vSAN performance, reducing latency and increasing throughput.
  • It is recommended to try new technologies and explore their benefits to optimize vSAN performance.

Some links to topics discussed:

#069 – AI on CPUs with Earl Ruby

In episode 69, Earl Ruby discusses his career highlights and his current role at Broadcom. He explains the Private AI Foundation with Intel and how it enables customers to run AI and ML workloads. The discussion then focuses on choosing between CPUs and GPUs for ML workloads, debunking misconceptions about CUDA, and the future of software tools like OneAPI. Earl also provides insights into AMX and its support in vSphere for running ML workloads on CPUs. In this conversation, Earl Ruby III discusses various topics related to AMX and large language models. He explains the concept of quantization and how it is used to run models on AMX. He also discusses the challenges of sizing virtual machines for large language models and the power consumption differences between GPUs and CPUs. The conversation touches on heterogeneous clusters and workload placement, as well as the future of AMX and Intel GPUs. Finally, Earl mentions his blog articles where he shares his insights and experiences.

Takeaways

  • The Private AI Foundation with Intel enables customers to run AI and ML workloads using Intel’s AMX instruction set and GPUs.
  • When choosing between CPUs and GPUs for ML workloads, consider factors such as use case, model complexity, and performance requirements.
  • CUDA is not the only option for writing optimized AI workloads, as Intel’s oneAPI provides an open API for working with their hardware.
  • AMX is a set of instructions backed by hardware in Intel CPUs for matrix multiplication and other matrix operations, and it is supported in vSphere for running ML workloads on CPUs. Quantization is a technique used to convert high bit numbers into lower bit equivalents, allowing for smaller memory footprint and accelerated processing on AMX.
  • Sizing virtual machines for large language models can be challenging, and it is important to consider the memory footprint and CPU cores required.
  • Power consumption of GPUs is higher than CPUs, especially when GPUs are underutilized. CPUs can become power competitive when not fully utilized.
  • Heterogeneous clusters can be used to ensure specific workloads land on AMX-enabled CPUs, while Kubernetes provides automatic workload placement based on hardware capabilities.
  • The future of AMX and Intel GPUs involves extensibility and integration with other GPU technologies. OneAPI allows for seamless software compatibility with new hardware.
  • AVX-512 can be used to accelerate ML workloads on older machines without AMX, but the performance boost is not as significant as with AMX or GPUs.
  • Earl Ruby shares his insights and experiences through his blog articles, where he provides solutions to unique challenges and saves others from similar frustrations.

Some links to topics discussed:

#068 – Diving into the VMC on AWS announcements with Niels Hagoort

In episode 067, we invited Niels Hagoort back to the show to talk about the latest VMC on AWS announcements.

Topics we discussed:

  • The new M7i instance, including its use cases and specs
  • The Cloud Management Add-on for VMC on AWS and how Aria can add value
  • The differences between the M7i instance and the other instance types, and the deployment considerations.

More about these announcements can be found here:
https://vmc.techzone.vmware.com/closer-look-m7i-instance-vmware-cloud-aws