Participate in DZone Research Surveys: You Can Shape Trend Reports! (+ Enter the Raffles)
Reliability Models and Metrics for Test Engineering
Modern API Management
When assessing prominent topics across DZone — and the software engineering space more broadly — it simply felt incomplete to conduct research on the larger impacts of data and the cloud without talking about such a crucial component of modern software architectures: APIs. Communication is key in an era when applications and data capabilities are growing increasingly complex. Therefore, we set our sights on investigating the emerging ways in which data that would otherwise be isolated can better integrate with and work alongside other app components and across systems.For DZone's 2024 Modern API Management Trend Report, we focused our research specifically on APIs' growing influence across domains, prevalent paradigms and implementation techniques, security strategies, AI, and automation. Alongside observations from our original research, practicing tech professionals from the DZone Community contributed articles addressing key topics in the API space, including automated API generation via no and low code; communication architecture design among systems, APIs, and microservices; GraphQL vs. REST; and the role of APIs in the modern cloud-native landscape.
Open Source Migration Practices and Patterns
MongoDB Essentials
Services, or servers, are software components or processes that execute operations on specified inputs, producing either actions or data depending on their purpose. The party making the request is the client, while the server manages the request process. Typically, communication between client and server occurs over a network, utilizing protocols such as HTTP for REST or gRPC. Services may include a User Interface (UI) or function solely as backend processes. With this background, we can explore the steps and rationale behind developing a scalable service. NOTE: This article does not provide instructions on service or UI development, leaving you the freedom to select the language or tech stack that suits your requirements. Instead, it offers a comprehensive perspective on constructing and expanding a service, reflecting what startups need to do in order to scale a service. Additionally, it's important to recognize that while this approach offers valuable insights into computing concepts, it's not the sole method for designing systems. The Beginning: Version Control Assuming clarity on the presence of a UI and the general purpose of the service, the initial step prior to service development involves implementing a source control/version control system to support the code. This typically entails utilizing tools like Git, Mercurial, or others to back up the code and facilitate collaboration, especially as the number of contributors grows. It's common for startups to begin with Git as their version control system, often leveraging platforms like github.com for hosting Git repositories. An essential element of version control is pull requests, facilitating peer reviews within your team. This process enhances code quality by allowing multiple individuals to review and approve proposed changes before integration. While I won't delve into specifics here, a quick online search will provide ample information on the topic. Developing the Service Once version control is established, the next step involves setting up a repository and initiating service development. This article adopts a language-agnostic approach, as delving into specific languages and optimal tech stacks for every service function would be overly detailed. For conciseness, let's focus on a service that executes functions based on inputs and necessitates backend storage (while remaining neutral on the storage solution, which will be discussed later). As you commence service development, it's crucial to grasp how to run it locally on your laptop or in any developer environment. One should consider this aspect carefully, as local testing plays a pivotal role in efficient development. While crafting the service, ensure that classes, functions, and other components are organized in a modular manner, into separate files as necessary. This organizational approach promotes a structured repository and facilitates comprehensive unit test coverage. Unit tests represent a critical aspect of testing that developers should rigorously prioritize. There should be no compromises in this regard! Countless incidents or production issues could have been averted with the implementation of a few unit tests. Neglecting this practice can potentially incur significant financial costs for a company. I won't delve into the specifics of integrating the gRPC framework, REST packages, or any other communication protocols. You'll have the freedom to explore and implement these as you develop the service. Once the service is executable and tested through unit tests and basic manual testing, the next step is to explore how to make it "deployable." Packaging the Service Ensuring the service is "deployable" implies having a method to run the process in a more manageable manner. Let's delve into this concept further. What exactly does this entail? Now that we have a runnable process, who will initiate it initially? Moreover, where will it be executed? Addressing these questions is crucial, and we'll now proceed to provide answers. In my humble opinion, managing your own compute infrastructure might not be the best approach. There are numerous intricacies involved in ensuring that your service is accessible on the Internet. Opting for a cloud service provider (CSP) is a wiser choice, as they handle much of the complexity behind the scenes. For our purposes, any available cloud service provider will suffice. Once a CSP is selected, the next consideration is how to manage the process. We aim to avoid manual intervention every time the service crashes, especially without notification. The solution lies in orchestrating our process through containerization. This involves creating a container image for our process, essentially a filesystem containing all necessary dependencies at the application layer. A "Dockerfile" is used to specify the steps for including the process and dependencies in the container image. Upon completion of the Dockerfile, the docker build cli can be used to generate an image with tags. This image is then stored locally or pushed to a container registry, serving as a repository for container images that can later be pulled onto a compute instance. With these steps outlined, the next question arises: how does containerization orchestrate our process? This will be addressed in the following section on executing a container. Executing the Container After building a container image, the subsequent step is its execution, which in turn initiates the service we've developed. Various container runtimes, such as containerd, podman, and others, are available to facilitate this process. In this context, we utilize the "docker" cli to manage the container, which interacts with containerd in the background. Running a container is straightforward: "docker run" executes the container and consequently, the developed process. You may observe logs in the terminal (if not run as a daemon) or use "docker logs" to inspect service logs if necessary. Additionally, options like "--restart" can be included in the command to automatically restart the container (i.e., the process) in the event of a crash, allowing for customization as required. At this stage, we have our process encapsulated within a container, ready for execution/orchestration as required. While this setup is suitable for local testing, our next step involves exploring how to deploy this on a basic compute instance within our chosen CSP. Deploying the Container Now that we have a container, it's advisable to publish it to a container registry. Numerous container registries are available, managed by CSPs or docker itself. Once the container is published, it becomes easily accessible from any CSP or platform. We can pull the image and run it on a compute instance, such as a Virtual Machine (VM), allocated within the CSP. Starting with this option is typically the most cost-effective and straightforward. While we briefly touch on other forms of compute infrastructure later in this article, deploying on a VM involves pulling a container image and running it, much like we did in our developer environment. Voila! Our service is deployed. However, ensuring accessibility to the world requires careful consideration. While directly exposing the VM's IP to the external world may seem tempting, it poses security risks. Implementing TLS for security is crucial. Instead, a better approach involves using a reverse proxy to route requests to specific services. This ensures security and facilitates the deployment of multiple services on the same VM. To enable internet access to our service, we require a method for inbound traffic to reach our VM. An effective solution involves installing a reverse proxy like Nginx directly on the VM. This can be achieved by pulling the Nginx container image, typically labeled as "nginx:latest". Before launching the container, it's necessary to configure Nginx settings such as servers, locations, and additional configurations. Security measures like TLS can also be implemented for enhanced protection. Once the Nginx configuration is established, it can be exposed to the container through volumes during container execution. This setup allows the reverse proxy to effectively route incoming requests to the container running on the same VM, using a specified port. One notable advantage is the ability to host multiple services within the VM, with routing efficiently managed by the reverse proxy. To finalize the setup, we must expose the VM's IP address and proxy port to the internet, with TLS encryption supported by the reverse proxy. This configuration adjustment can typically be configured through the CSP's settings. NOTE: The examples of solutions provided below may reference GCP as the CSP. This is solely for illustrative purposes and should not be interpreted as a recommendation. The intention is solely to convey concepts effectively. Consider the scenario where managing a single VM manually becomes laborious and lacks scalability. To address this challenge, CSPs offer solutions akin to managed instance groups, comprising multiple VMs configured identically. These groups often come with features like startup scripts, which execute upon VM initialization. All the configurations discussed earlier can be scripted into these startup scripts, simplifying the process of VM launch and enhancing scalability. This setup proves beneficial when multiple VMs are required to handle requests efficiently. Now, the question arises: when dealing with multiple VMs, how do we decide where to route requests? The solution is to employ a load balancer provided by the CSP. This load balancer selects one VM from the pool to handle each request. Additionally, we can streamline the process by implementing general load balancing. To remove individual reverse proxies, we can utilize multiple instance groups for every service needed, accompanied by load balancers for each. The general load balancer can expose its IP with TLS configuration and route setup, ensuring that only service containers run on the VM. It's essential to ensure that VM IPs and ports are accessible solely by the load balancer in the ingress path, a task achievable through configurations provided by the CSP. At this juncture, we have a load balancer securely managing requests, directing them to the specific container service within a VM from a pool of VMs. This setup itself contributes to scaling our service. To further enhance scalability and eliminate the need for continuous VM operation, we can opt for an autoscaler policy. This policy dynamically scales the VM group up or down based on parameters such as CPU, memory, or others provided by the CSP. Now, let's delve into the concept of Infrastructure as Code (IaC), which holds significant importance in efficiently managing CSP components that promote scale. Essentially, IaC involves managing CSP infrastructure components through configuration files, interpreted by an IaC tool (like Terraform) to manage CSP infrastructure accordingly. For more detailed information, refer to the wiki. Datastore We've previously discussed scaling our service, but it's crucial to remember that there's typically a requirement to maintain a state somewhere. This is where databases or datastores play a pivotal role. From experience, handling this aspect can be quite tricky, and I would once again advise against developing a custom solution. CSP solutions are ideally suited for this purpose. CSPs generally handle the complexity associated with managing databases, addressing concepts such as master-slave architecture, replica management, synchronous-asynchronous replication, backups/restores, consistency, and other intricate aspects more effectively. Managing a database can be challenging due to concerns about data loss arising from improper configurations. Each CSP offers different database offerings, and it's essential to consider the specific use cases the service deals with to choose the appropriate offering. For instance, one may need to decide between using a relational database offering versus a NoSQL offering. This article does not delve into these differences. The database should be accessible from the VM group and serve as a central datastore for all instances where the state is shared. It's worth noting that the database or datastore should only be accessible within the VPC, and ideally, only from the VM group. This is crucial to prevent exposing the ingress IP for the database, ensuring security and data integrity. Queues In service design, we often encounter scenarios where certain tasks need to be performed asynchronously. This means that upon receiving a request, part of the processing can be deferred to a later time without blocking the response to the client. One common approach is to utilize databases as queues, where requests are ordered by some identifier. Alternatively, CSP services such as Amazon SQS or GCP pub/sub can be employed for this purpose. Messages published to the queue can then be retrieved for processing by a separate service that listens to the queue. However, we won't delve into the specifics here. Monitoring In addition to the VM-level monitoring typically provided by the CSP, there may be a need for more granular insights through service-level monitoring. For instance, one might require latency metrics for database requests, metrics based on queue interactions, or metrics for service CPU and memory utilization. These metrics should be collected and forwarded to a monitoring solution such as Datadog, Prometheus, or others. These solutions are typically backed by a time-series database (TSDB), allowing users to gain insights into the system's state over a specific period of time. This monitoring setup also facilitates debugging certain types of issues and can trigger alerts or alarms if configured to do so. Alternatively, you can set up your own Prometheus deployment, as it is an open-source solution. With the aforementioned concepts, it should be feasible to deploy a scalable service. This level of scalability has proven sufficient for numerous startups that I have provided consultation for. Moving forward, we'll explore the utilization of a "container orchestrator" instead of deploying containers in VMs, as described earlier. In this article, we'll use Kubernetes (k8s) as an example to illustrate this transition. Container Orchestration: Enter Kubernetes (K8s) Having implemented the aforementioned design, we can effectively manage numerous requests to our service. Now, our objective is to achieve decoupling to further enhance scalability. This decoupling is crucial because a bug in any service within a VM could lead to the VM crashing, potentially causing the entire ecosystem to fail. Moreover, decoupled services can be scaled independently. For instance, one service may have sufficient scalability and effectively handle requests, while another may struggle with the load. Consider the example of a shopping website where the catalog may receive significantly more visits than the checkout page. Consequently, the scale of read requests may far exceed that of checkouts. In such cases, deploying multiple service containers into Kubernetes (K8s) as distinct services allows for independent scaling. Before delving into specifics, it's worth noting that CSPs offer Kubernetes as a compute platform option, which is essential for scaling to the next level. Kubernetes (K8s) We won't delve into the intricacies of Kubernetes controllers or other aspects in this article. The information provided here will suffice to deploy a service on Kubernetes. Kubernetes (K8s) serves as an abstraction over a cluster of nodes with storage and compute resources. Depending on where the service is scheduled, the node provides the necessary compute and storage capabilities. Having container images is essential for deploying a service on Kubernetes (K8s). Resources in K8s are represented by creating configurations, which can be in YAML or JSON format, and they define specific K8s objects. These objects belong to a particular "namespace" within the K8s cluster. The basic unit of compute within K8s is a "Pod," which can run one or more containers. Therefore, a config for a pod can be created, and the service can then be deployed onto a namespace using the K8s CLI, kubectl. Once the pod is created, your service is essentially running, and you can monitor its state using kubectl with the namespace as a parameter. To deploy multiple pods, a "deployment" is required. Kubernetes (K8s) offers various resources such as deployments, stateful sets, and daemon sets. The K8s documentation provides sufficient explanations for these abstractions, we won't discuss each of them here. A deployment is essentially a resource designed to deploy multiple pods of a similar kind. This is achieved through the "replicas" option in the configuration, and you can also choose an update strategy according to your requirements. Selecting the appropriate update strategy is crucial to ensure there is no downtime during updates. Therefore, in our scenario, we would utilize a deployment for our service that scales to multiple pods. When employing a Deployment to oversee your application, Pods can be dynamically generated and terminated. Consequently, the count and identities of operational and healthy Pods may vary unpredictably. Kubernetes manages the creation and removal of Pods to sustain the desired state of your cluster, treating Pods as transient resources with no assured reliability or durability. Each Pod is assigned its own IP address, typically managed by network plugins in Kubernetes. As a result, the set of Pods linked with a Deployment can fluctuate over time, presenting a challenge for components within the cluster to consistently locate and communicate with specific Pods. This challenge is mitigated by employing a Service resource. After establishing a service object, the subsequent topic of discussion is Ingress. Ingress is responsible for routing to multiple services within the cluster. It facilitates the exposure of HTTP, HTTPS, or even gRPC routes from outside the cluster to services within it. Traffic routing is managed by rules specified on the Ingress resource, which is supported by a load balancer operating in the background. With all these components deployed, our service has attained a commendable level of scalability. It's worth noting that the concepts discussed prior to entering the Kubernetes realm are mirrored here in a way — we have load balancers, containers, and routes, albeit implemented differently. Additionally, there are other objects such as Horizontal Pod Autoscaler (HPA) for scaling pods based on memory/CPU utilization, and storage constructs like Persistent volumes (PV) or Persistent Volume Claims (PVC), which we won't delve into extensively. Feel free to explore these for a deeper understanding. CI/CD Lastly, I'd like to address an important aspect of enhancing developer efficiency: Continuous Integration/Deployment (CI/CD). Continuous Integration (CI) involves running automated tests (such as unit, end-to-end, or integration tests) on any developer pull request or check-in to the version control system, typically before merging. This helps identify regressions and bugs early in the development process. After merging, CI generates images and other artifacts required for service deployment. Tools like Jenkins (Jenkins X), Tekton, Git actions and others facilitate CI processes. Continuous Deployment (CD) automates the deployment process, staging different environments for deployment, such as development, staging, or production. Usually, the development environment is deployed first, followed by running several end-to-end tests to identify any issues. If everything functions correctly, CD proceeds to deploy to other environments. All the aforementioned tools also support CD functionalities. CI/CD tools significantly improve developer efficiency by reducing manual work. They are essential to ensure developers don't spend hours on manual tasks. Additionally, during manual deployments, it's crucial to ensure no one else is deploying to the same environment simultaneously to avoid conflicts, a concern that can be addressed effectively by our CD framework. There are other aspects like dynamic config management and securely storing secrets/passwords and logging system, though we won't delve into details, I would encourage readers to look into the links provided. Thank you for reading!
Filtering system calls is an essential component of many host-based runtime security products on Linux systems. There are many different techniques that can be used to monitor system calls, all of which have certain tradeoffs. Recently, kernel modules have become less popular in favor of user space runtime security agents due to portability and stability benefits. Unfortunately, it is possible to architect user space agents in such a way that they are susceptible to several attacks such as time of check time of use (TOCTOU), agent tampering, and resource exhaustion. This article explains attacks that often affect user space security products and how popular technologies such as Seccomp and eBPF can be used in such a way that avoids these issues. Attacks Against User Space Agents User space agents are often susceptible to several attacks such as TOCTOU, tampering, and resource exhaustion. These attacks all take advantage of the fact that the user space agent must communicate with the kernel before it makes a decision about system call or other action that occurs on the system. Generally, these attacks attempt to modify data passed in system calls in such a way that prevents a user space agent from detecting an attack or taking advantage of the fact that the agent does not protect itself from tampering. TOCTOU vulnerabilities present a substantial risk to user space security agents running on the Linux kernel. These vulnerabilities arise when security decisions are based on data that can be altered by an attacker between the check and the subsequent use. For instance, a user space security agent might check the arguments of a system call before allowing a certain operation, but during the time gap before the operation is executed, an adversary could change the system call’s arguments. This manipulation could lead to a divergence between the state perceived by the security agent and the actual state, potentially resulting in security breaches. Addressing TOCTOU challenges in user space security agents requires careful consideration of synchronization mechanisms, ensuring that checks and corresponding actions are executed atomically to prevent exploitation. Resource exhaustion poses a notable threat to user space security agents operating on the Linux kernel, often through the execution of an excessive number of system calls. In this scenario, attackers exploit the agent's requirement to check system calls in a manner that is non blocking. By initiating a barrage of system calls, such as file operations, network connections, or process creation, adversaries aim to overload the agent with benign events and exhaust the agent’s resources such as CPU, memory, or network bandwidth. user space security agents need to implement effective blocking mechanisms that enable them to perform a check on a system call before allowing the call to complete its execution. Tampering attacks are another common issue user space security agents must address. In these attacks, adversaries aim to manipulate the behavior or compromise the integrity of the user space security agent itself, rendering it ineffective or allowing it to be bypassed. Typically, tampering with the agent requires root level access to the system as most security agents run a root. Tampering can take various forms, including altering the configuration of the security agent, deleting or modifying the agent’s executable files on disk, injecting malicious code into its processes, and temporarily pausing or killing its processes with signals. By subverting the user space security agent, attackers can disable critical security features and evade detection. user space security agents must be aware of these attacks and have the appropriate detection mechanisms built in. Seccomp for Kernel Filtering Seccomp, short for “Secure Computing”, is a Linux kernel feature designed to filter system calls made by a process thread. It allows user space security agents to define a restricted set of allowed system calls, reducing the attack surface of an application. Options for system calls that violate the filter include killing the application and notifying another user space process such as a user space security agent. Traditional Seccomp operates by preventing all system calls except for read, write, and exit which significantly restricts the system calls a thread may execute. Seccomp BPF (Berkeley Packet Filter) is an evolution that provides a more flexible filtering mechanism compared to traditional seccomp. Unlike the original version, Seccomp-BPF allows for the dynamic loading of custom Berkeley Packet Filter programs, enabling more fine-grained control over filtering criteria. Seccomp BPF enables the restriction of specific system calls and enables inspection of system call parameters to inform filtering decisions. Seccomp-BPF cannot dereference pointers, so its system call argument analysis is focused on the value of the arguments themselves. By enforcing policies that exclude potentially risky system calls and interactions, Seccomp-BPF contributes significantly to enhancing application security, with the latter offering a more versatile approach to system call filtering. Seccomp avoids the TOCTOU problem by evaluating system call arguments directly. Because seccomp inspects arguments by value, it is not possible for an attacker to alter them after an initial system call. Thus, the attacker does not have an opportunity to modify the data inspected by seccomp after the security check is performed. It is important to note that user space applications that need to dereference pointers to inspect data such as file paths must do so carefully, as this approach can potentially be manipulated by TOCTOU attacks if appropriate precautions are not taken. For example, a security agent could change the value of a pointer argument to a system call to a non-deterministic location and explicitly set the memory it points to. This approach makes TOCTOU attacks more challenging because it prevents another malicious thread in the monitored process from modifying memory pointed to by the original system call arguments. Seccomp is designed with tampering in mind. Both seccomp and seccomp BPF are immutable. Once a thread has seccomp enabled, it cannot be disabled. Similarly, seccomp BPF filters are inherited by all child processes. If additional seccomp programs are added, they are executed in LIFO order. All seccomp BPF filters that are loaded are executed, and the most restrictive result returned by the filters is enacted on the thread. Because seccomp settings and filters are immutable and inherited by child processes, it is not possible for an attacker to bypass their defenses without a kernel exploit. It is important that seccomp BPF filters consider both 64-bit and 32-bit system calls as one technique sometimes used to evade filtering is to change the ABI to 32-bit on a 64-bit operating system. Seccomp avoids resource exhaustion because all system call checks occur inline and before the system call is executed. Thus, the thread executing the system call is blocked while the filter is inspecting the system call arguments. This approach prevents the calling thread from executing additional system calls while the seccomp filter is operating. Because seccomp BPF filters are pure functions, they cannot save data across executions. So, it is not possible to cause them to run out of working memory by storing data about previously executed system calls. This approach ensures seccomp will not cause a system to have reduced memory consumption. By avoiding TOCTOU, tampering, and resource consumption issues, seccomp provides a powerful mechanism for security teams and application developers to enhance their security posture. Seccomp provides a flexible approach to runtime detection and protection against various threats, from malware to exploitable vulnerabilities that works across Linux distributions. Thus, teams can use seccomp to enhance the security posture of their entire Linux workloads in the cloud, in the data center, and at the edge. eBPF for Kernel Filtering eBPF can mitigate TOCTOU vulnerabilities by executing filtering logic directly within the kernel, eliminating the need for transitions between user space and kernel space. This inline execution ensures that security decisions are made atomically, leaving no opportunity for attackers to manipulate the system state between security checks and system call execution. However, it is also dependent on where exactly the program hooks into the kernel. When hooking into system calls, the memory location with the pathname to be accessed belongs to user space, and user space can change it after the hook runs, but before the pathname is used to perform the actual open in-kernel operation. This is depicted in the image below, where the bpf hook checks the “innocent” path, but the kernel operation actually happens with the “suspicious” path. Hooking into a kernel function that happens after the path is copied from user space to kernel space avoids this problem because the hook operates on memory that the user space application cannot modify. For example in file integrity monitoring, instead of a system call, we could hook into the security_file_permission function, which is called on every file access or security_file_open and is executed whenever a file is opened. By accessing system call arguments within the kernel context, eBPF programs can ensure that security decisions are based on consistent and verifiable information, effectively neutralizing TOCTOU attack vectors. It is impossible to do proper enforcement without in-kernel filtering because by the time the event has reached user-space, it is already too late if the operation has already been executed. eBPF also provides robust mechanisms for preventing tampering attacks by executing filtering logic within the kernel. Unlike user space agents, which may be susceptible to tampering attempts targeting their executable files, memory contents, or configuration settings, eBPF programs operate within the highly privileged kernel context, where access controls and integrity protections are strictly enforced. For instance, an eBPF program enforcing integrity checks on critical system files can maintain cryptographic hashes of file contents within kernel memory, ensuring that any unauthorized modifications are detected and prevented in real time. With eBPF, the state of what is watched can be updated in the kernel inline with the operations, while doing this in user-space introduces race conditions. Finally, eBPF addresses resource exhaustion attacks by implementing efficient event filtering and resource management strategies within the kernel. Unlike user space agents, which may be overwhelmed by excessive system call traffic, eBPF programs can leverage kernel-level optimizations to efficiently process and prioritize incoming events, ensuring optimal utilization of system resources. Deciding at the eBPF hook whether the event is of interest to the user or not, means that no extraneous events will be generated and processed by the agent. The alternative, to do the filtering in user-space, tends to induce significant overhead for events that happen very frequently in a system (such as file access or networking) that can lead to resource exhaustion. Low overhead in-kernel filtering means security teams no longer have a resource concern driving decisions on how many files to monitor or whether to enable FIM on systems with extensive I/O operations such as on database servers. eBPF can filter out non-relevant events that are uninteresting to the policy, repetitive, or part of the normal expected behavior to minimize overhead. Thus, eBPF-based security agents can optimize resource utilization and ensure uninterrupted protection against resource exhaustion attacks. By leveraging eBPF's capabilities to mitigate TOCTOU vulnerabilities, prevent tampering attacks, and mitigate resource exhaustion risks, security teams can develop runtime security solutions that effectively protect Linux systems against a wide range of threats.
Editor's Note: The following is an article written for and published in DZone's 2024 Trend Report, Modern API Management: Connecting Data-Driven Architectures Alongside AI, Automation, and Microservices. Microservices-based applications are distributed in nature, and each service can run on a different machine or in a container. However, splitting business logic into smaller units and deploying them in a distributed manner is just the first step. We then must understand the best way to make them communicate with each other. Microservices Communication Challenges Communication between microservices should be robust and efficient. When several small microservices are interacting to complete a single business scenario, it can be a challenge. Here are some of the main challenges arising from microservice-to-microservice communication. Resiliency There may be multiple instances of microservices, and an instance may fail due to several reasons — for example, it may crash or be overwhelmed with too many requests and thus unable to process requests. There are two design patterns that make communication between microservices more resilient: retry and circuit breakers. Retry In a microservices architecture, transient failures are unavoidable due to communication between multiple services within the application, especially on a cloud platform. These failures could occur due to various scenarios such as a momentary connection loss, response time-out, service unavailability, slow network connections, etc. (Shrivastava, Shrivastav 2022). Normally, these errors resolve by themselves by retrying the request either immediately or after a delay, depending on the type of error that occurred. The retry is carried out for a preconfigured number of times until it times out. However, a point of note is that the logical consistency of the operation must be maintained during the request to obtain repeatable responses and avoid potential side effects outside of our expectations. Circuit Breaker In a microservices architecture, as discussed in the previous section, failures can occur due to several reasons and are typically self-resolving. However, this may not always be the case since a situation of varying severity may arise where the errors take longer than estimated to be resolved or may not be resolved at all. The circuit breaker pattern, as the name implies, causes a break in a function operation when the errors reach a certain threshold. Usually, this break also triggers an alert that can be monitored. As opposed to the retry pattern, a circuit breaker prevents an operation that’s likely to result in failure from being performed. This prevents congestion due to failed requests and the escalation of failures downstream. The operation can be continued with the persisting error enabling the efficient use of computing resources. The error does not stall the completion of other operations that are using the same resource, which is inherently limited (Shrivastava, Shrivastav 2022). Distributed Tracing Modern-day microservices-architecture-based applications are made up of distributed systems that are exceedingly complex to design, and monitoring and debugging them becomes even more complicated. Due to the large number of microservices involved in an application that spans multiple development teams, systems, and infrastructures, even a single request involves a complex network of communication. While this complex distributed system enables a scalable, efficient, and reliable system, it also makes system observability more challenging to achieve, thereby creating issues with troubleshooting. Distributed tracing helps us overcome this observability challenge by using a request-centric view. As a request is processed by the components of a distributed system, distributed tracing captures the detailed execution of the request and its causally related actions across the system's components (Shkuro 2019). Load Balancing Load balancing is the method used to utilize resources optimally and to ensure smooth operational performance. In order to be efficient and scalable, more than one instance of a service is used, and the incoming requests are distributed across these instances for a smooth process flow. In Kubernetes, load balancing algorithms are implemented in a more effective manner using a service mesh, which is based on recorded metrics such as latency. Service meshes mainly manage the traffic between services on the network, ensuring that inter-service communications are safe and reliable by enabling the services to detect and communicate with each other. The use of a service mesh improves observability and aids in monitoring highly distributed systems. Security Each service must be secured individually, and the communication between services must be secure. In addition, there needs to be a centralized way to manage access controls and authentication across all services. One of the most popular ways for securing microservices is to use API gateways, which act as proxies between the clients and the microservices. API gateways can perform authentication and authorization checks, rate limiting, and traffic management. Service Versioning The deployment of a microservice version update often leads to unexpected issues and breaking errors between the new version of the microservice and other microservices in the system, or even external clients using that microservice. While the team deploying the new version attempts to mitigate and reduce these breaks, multiple versions of the same microservice can be run simultaneously, thereby allowing requests to be routed to the appropriate version of the microservice. This is done using API versioning for API contracts. Communication Patterns Communication between microservices can be designed by using two main patterns: synchronous and asynchronous. In Figure 1, we see a basic overview of these communication patterns along with their respective implementation styles and choices. Figure 1. Synchronous and asynchronous communication with common implementation technologies Synchronous Pattern Synchronous communication between microservices is one-to-one communication. The microservice that generates the request is blocked until a response is received from the other service. This is done using HTTP requests or gRPC — a high-performance remote procedure call (RPC) framework. In synchronous communication, the microservices are tightly coupled, which is advantageous for less distributed architectures where communication happens in real time, thereby reducing the complexity of debugging (Newman 2021). Figure 2. Synchronous communication depicting the request-response model The following table shows a comparison between technologies that are commonly used to implement the synchronous communication pattern. Table 1. REST vs. gRPC vs. GraphQL REST gRPC GraphQL Architectural principles Uses a stateless client-server architecture; relies on URIs and HTTP methods for a layered system with a uniform interface Uses the client-server method of remote procedure call; methods are directly called by the client and behave like local methods, although they are on the server side Uses client-driven architecture principles; relies on queries, mutations, and subscriptions via APIs to request, modify, and update data from/on the server HTTP methods POST, GET, PUT, DELETE Custom methods POST Payload data structure to send/receive data JSON- and XML-based payload Protocol Buffers-based serialized payloads JSON-based payloads Request/response caching Natively supported on client and server side Unsupported by default Supported but complex as all requests have a common endpoint Code generation Natively unsupported; requires third-party tools like Swagger Natively supported Natively unsupported; requires third-party tools like GraphQL code generator Asynchronous Pattern In asynchronous communication, as opposed to synchronous, the microservice that initiates the request is not blocked until the response is received. It can proceed with other processes without receiving a response from the microservice it sends the request to. In the case of a more complex distributed microservices architecture, where the services are not tightly coupled, asynchronous message-based communication is more advantageous as it improves scalability and enables continued background operations without affecting critical processes (Newman 2021). Figure 3. Asynchronous communication Event-Driven Communication The event-driven communication pattern leverages events to facilitate communication between microservices. Rather than sending a request, microservices generate events without any knowledge of the other microservices' intents. These events can then be used by other microservices as required. The event-driven pattern is asynchronous communication as the microservices listening to these events have their own processes to execute. The principle behind events is entirely different from the request-response model. The microservice emitting the event leaves the recipient fully responsible for handling the event, while the microservice itself has no idea about the consequences of the generated event. This approach enables loose coupling between microservices (Newman 2021). Figure 4. Producers emit events that some consumers subscribe to Common Data Communication through common data is asynchronous in nature and is achieved by having a microservice store data at a specific location where another microservice can then access that data. The data's location must be persistent storage, such as data lakes or data warehouses. Although common data is frequently used as a method of communication between microservices, it is often not considered a communication protocol because the coupling between microservices is not always observable when it is used. This communication style finds its best use case in situations that involve large volumes of data as a common data location prevents redundancy, makes data processing more efficient, and is easily scalable (Newman 2021). Figure 5. An example of communication through common data Request-Response Communication The request-response communication model is similar to the synchronous communication that was previously discussed — where a microservice provides a request to another microservice and has to await a response. Along with the previously discussed protocols (HTTP, gPRC, etc.), message queues are used as well. Request-response is implemented as one of the following two methods: Blocking synchronous – Microservice A opens a network connection and sends a request to Microservice B along this connection. The established connection stays open while Microservice A waits for Microservice B to respond. Non-blocking asynchronous – Microservice A sends a request to Microservice B, and Microservice Bneeds to know implicitly where to route the response. Also, message queues can be used; they provide an added benefit of buffering multiple requests in the queue to await processing. This method is helpful in situations where the rate of requests received exceeds the rate of handling these requests. Rather than trying to handle more requests than its capacity, the microservice can take its time generating a response before moving on to handle the next request (Newman 2021). Figure 6. An example of request-response non-blocking asynchronous communication Conclusion In recent years, we have observed a paradigm shift from designing large, clunky, monolithic applications that are complex to scale and maintain to using microservices-based architectures that enable the design of distributed applications — ones that can integrate multiple communication patterns and protocols across systems. These complex distributed systems can be developed, deployed, scaled, and maintained independently by different teams with fewer conflicts, resulting in a more robust, reliable, and resilient application. Using the most optimal communication pattern and protocol for the exact operation that a microservice must achieve is a crucial task and has a huge impact on the functionality and performance of an application. The aim is to make the communication between microservices as seamless as possible to establish an efficient system. In-depth knowledge regarding the available communication patterns and protocols is an essential aspect of modern-day cloud-based application design that is not only dynamic but also highly competitive with multiple contenders providing identical applications and services. Speed, scalability, efficiency, security, and other additional features are often crucial in determining the overall quality of an application, and proper microservices communication is the backbone to achieving those capabilities. References: Shrivastava, Saurabh. Shrivastav, Neelanjali. 2022. Solutions Architect's Handbook, 2nd Edition. Packt. Shkuro, Yuri. 2019. Mastering Distributed Tracing. Packt. Newman, Sam. 2021. Building Microservices, 2nd Edition. O'Reilly. This is an excerpt from DZone's 2024 Trend Report, Modern API Management: Connecting Data-Driven Architectures Alongside AI, Automation, and Microservices.Read the Free Report
The slow Java startup problem is notorious in the Java community, but the meaning can confuse the observer. The slow startup problem relates to the process of starting a set of interconnected applications on complex Java frameworks. This process includes starting several applications in Spring Boot, and each of them takes around 10 seconds. So the start of such production as a whole will take a minute, but the start of a JVM in this set is 50 milliseconds. The widespread meaning of slow Java startup referred to this process is not exactly true, as technically this is not a Java problem, but a problem of the framework. The effect of slow start-up and warm-up is caused by complex frameworks that we use and dynamic features in the runtime. Java is unique in its functionalities, and thanks to its coding and ecosystem power, Java is very popular among enterprises. The same complexity, though, can make it clumsy in the cloud. Java application startup and warmup technically include several consecutive processes: JVM startup, application startup, and JVM warmup. In these processes, the JVM gets extra time to provide application peak performance. The warmup phase is taken by JVM to compile and optimize the code. This process is needed for code interpretation and optimization and lasts substantially longer than the startup in cases of large complex applications, taking up to several minutes. Every time you start your program, these processes begin from scratch. In practice, it means that we spend time running the application and use significant CPU and memory resources to ensure its performance at the startup point. Therefore, the slow startup and warmup leads to extra resources spent for the phase preparing the application to run rather than the resources that might be required for its operation. Consequently, with the slow startup and warmup, you get increased cloud costs and resource over utilization. Search for the Solutions There are several ways to deal with the issue. Java Optimization Migrating to a newer long-term release (LTS) version of Java can improve application performance slightly, bringing minor changes. Such optimization is a quick method, available immediately. GraalVM Using native images can be beneficial. However, using GraalVM may bring problems such as compilation difficulties, strange errors, and different flags, making it unsuitable for some projects. Project Leyden Its primary goal is to "improve the startup time, time to peak performance, and footprint of Java programs." This project still needs to be completed, and we cannot yet evaluate the effect and possible difficulties of adaptation. However, among all, the Leyden project is designed to solve the problem of slow startup and we follow the news with great expectations on results. Coordinated Restore at Checkpoint It is an OpenJDK project entirely focused on Java startup enhancement. The project's primary aim is to develop a new standard mechanism-agnostic API to notify Java programs about the checkpoint and restore events. Coordinated Restore at Checkpoint (CRaC) offers a Checkpoint/Restore API mechanism solution allowing the creation of an image of a running application at an arbitrary point in time ("checkpoint") and then starting the image from the checkpoint file (snapshot). This process restores the state of an application from the point when the checkpoint was made. Using the CRaC feature with Java runtime enables you to pause the application and restart it from the moment it was paused, and in addition, gives the option to distribute numerous replicas of this file, which is especially relevant for deployment on multiple instances. Amazon Lambda Amazon Lambda is a standalone product based on CRaC technology. Lambda runs your code on a high-availability compute infrastructure and performs all of the administration of the compute resources, including server and operating system maintenance, capacity provisioning, automatic scaling, and logging. Lambdas can be very convenient for your development goals, but they are also more expensive and less effective, compared to the JVMs. The Effectiveness and Your Runtime Sustainability The slow startup problem impacts the overall performance of your runtime, and to make your application sustainable and performant, you need to use one of these solutions. Among the above stated, the CRaC solution is the most popular for the Java community today. CRaC, just like Project Leyden, is targeted to solve the issue of slow startup. We cannot evaluate and test Leyden's results fully yet. The project introduced Class Data Sharing + AOT on steroids, which looks very promising for synergy with Java capable of delivering faster startup on JVM. However, there are no ready-made solutions that can be deployed with Java yet. The advantage of the CRaC feature is that it is already available, and getting spread around quickly. Today, you can get OpenJDK runtime and even containers that support the CRaC API. These solutions are ready to install and allow immediate significant improvements. OpenJDK runtimes and small containers with CRaC support will be especially relevant for Spring developers. Spring announced CRaC feature support in 2023, and their recommended runtime is Liberica JDK, which delivers the runtime version with CRaC. It should be noted that the Native Image technology is also highly relevant for Spring users to reach faster startup of their application. Native images can run with a smaller memory footprint and do not require Java Virtual Machine for deployment. However, GraalVM requires individual research given the specifics of your Java application, and it will not always be suitable for resolving the issue. In the case of Amazon Lambdas, you should consider the costs of this product and its effectiveness, as it might ultimately deliver an extra financial burden. Its main advantage is convenience. The key CRaC advantage today is its availability and ease of use, combined with an instant effect on the application performance and cloud costs. CRaC solves the problem immediately. OpenJDK runtime with support for Coordinated Restore at Checkpoint advances your application with a feature to quickly create and restore images of a running application, reducing the startup and warmup times from minutes to milliseconds. Enhancing your application with Linux-based containers supported with CRaC strengthens its performance even further. CRaC lowers the load on the processor and memory at the application startup, reducing the cloud costs and improving application performance and sustainability.
What Is Multi-Tenancy? Tenancy enables users to share cluster infrastructure among: Multiple teams within the organization Multiple customers of the organization Multi-environments of the application Shared clusters save costs and simplify administration. Security and isolation are key factors to consider when cluster resources are to be shared. Two prominent isolation models to achieve multi-tenancy are hard and soft tenancy models. The key difference between these models lies in the level of isolation provided between tenants. Soft tenancy has a lower level of isolation and uses mechanisms like namespaces, quotas, and limits to restrict tenant access to resources and prevent them from interfering with each other while hard tenancy has stronger isolation. Often involves separate clusters or virtual machines for each tenant, with minimal shared resources. Kubernetes Native Services in Multi-Tenant Implementations Kubernetes has a built-in namespace model to create logical partitions of the cluster as isolated slices. Though basic levels of tenancy can be achieved, using namespaces has some limitations: Implementing advanced multi-tenancy scenarios, like Hierarchical Namespaces (HNS) or exposing Container as a Service (CaaS) becomes complicated because of the flat structure of Kubernetes namespaces. Namespaces have no common concept of ownership. Tracking and administration challenges persist if the team controls multiple namespaces. Enforcing resource quotas and limits fairly across all tenants requires additional effort. Only highly privileged users can create namespaces. This means that whenever a team wants a new namespace, they must raise a ticket to the cluster administrator. While this is probably acceptable for small organizations, it generates unnecessary toil as the organization grows. To solve this problem, Kubernetes provides the Hierarchical Namespace Controller (HNC), which allows the user to organize the namespaces into hierarchies. Namespaces are organized in a tree structure, where child namespaces inherit resources and policies from parent namespaces. While HNC supports a soft-tenancy approach leveraging existing namespaces, however, is a newer project still under incubation in the Kubernetes community. Other wide projects that provide similar capabilities are Capsule, Rafay, Kiosk, etc. In this article series, we will discuss implementing multi-tenant solutions using the Capsule framework. Capsule is a commercially supported open-source project that implements multi-tenancy using virtual control planes. Each tenant gets a dedicated control plane with its own API server and etcd instance, creating a virtualized Kubernetes cluster experience. Capsule is one of the recommended platforms by the Kubernetes community for multi-tenancy. Major components of the Capsule framework include: Capsule controller: Aggregates multiple namespaces in a lightweight abstraction called Tenant. Capsule policy engine: Achieves tenant isolation by the various Network and Security Policies, Resource Quota, Limit Ranges, RBAC, and other policies defined at the tenant level. A user who owns the tenant is called a Tenant Owner. There is a small contrast between the roles of a tenant owner and namespace administrator. Listed below are the roles and responsibilities of the cluster admin, the tenant owner, and the namespace administrator. Install Capsule Framework We will use the AWS EKS cluster to perform the exercise. This article assumes you have already created an EKS cluster "eks-cluster1" and the following software is already installed on your local machine. AWS CLI (Version 2) Kubectl (v1.21) Curl (8.1.2) Helm Charts (3.8.2) Go Lang (v1.20.6) Capsule can be installed in the two ways listed below: Using YAML Installer AWS CLI (Version 2) PowerShell aws eks --region us-east-1 update-kubeconfig --name eks-cluster1 kubectl apply -f https://raw.githubusercontent.com/clastix/capsule/master/config/install.yaml If you face any error in applying the YAML file, re-running the same command should fix the problem. If you see the status of the pod as “ImagePullback” or “errImagePull,” delete the pod of deployment (not the deployment). Using Helm As a cluster admin or root user, run the following commands to install using Helm. PowerShell aws eks --region us-east-1 update-kubeconfig --name eks-cluster1 helm repo add clastix https://clastix.github.io/charts helm install capsule clastix/capsule -n capsule-system --create-namespace Verify Capsule Installation What gets installed with the Capsule framework: Namespace: capsule-system Deployments in Namespace: capsule-controller-manager Services Exposed: capsule-controller-manager-metrics-service capsule-webhook-service Secrets in Namespace: capsule-ca capsule-tls Webhooks: In Kubernetes, webhooks are a mechanism for external services to interact with the Kubernetes API server during the lifecycle of API requests. They act like HTTP callbacks, triggered at specific points in the request flow. This allows external services to perform validations or modifications on resources before they are persisted in the cluster. There are two main types of webhooks used in Kubernetes for admission control: Mutating Admission Webhooks and Validating Admission Webhooks. The following webhooks are installed: capsule-mutating-webhook-configuration capsule-validating-webhook-configuration Custom Resource Definitions (CRDs): CRDs allow the user to extend the API and introduce new types of resources beyond the built-in ones. Imagine them as blueprints for creating your own custom resources that can be managed alongside familiar resources like Deployments and Pods. The CRDs below are installed: capsuleconfigurations.capsule.clastix.io globaltenantresources.capsule.clastix.io tenantresources.capsule.clastix.io tenants.capsule.clastix.io Cluster Roles capsule-namespace-deleter capsule-namespace-provisioner Cluster Role Bindings capsule-manager-rolebinding capsule-proxy-rolebinding Follow the below steps to check if Capsule is installed properly: Login in to verify the following commands as a root user or cluster administrator. This should list ‘capsule-system’ namespace. PowerShell aws eks --region us-east-1 update-kubeconfig --name eks-cluster1 kubectl get ns Run the below commands to see capsule-related components. PowerShell kubectl -n capsule-system get deployments kubectl -n capsule-system get svc kubectl -n capsule-system delete deployment capsule-controller-manager kubectl get mutatingwebhookconfigurations kubectl get validatingwebhookconfigurations Get capsule CRDs installed. PowerShell kubectl get crds If any of the CRDs are missing, apply the respective kubectl command mentioned below. Please note the Capsule version in the said URL, your mileage may vary according to the desired upgrading version. PowerShell kubectl apply -f https://raw.githubusercontent.com/clastix/capsule/v0.3.3/charts/capsule/crds/globaltenantresources-crd.yaml kubectl apply -f https://raw.githubusercontent.com/clastix/capsule/v0.3.3/charts/capsule/crds/tenant-crd.yaml kubectl apply -f https://raw.githubusercontent.com/clastix/capsule/v0.3.3/charts/capsule/crds/tenantresources-crd.yaml View the clusterroles and rolesbindings by running the below commands kubectl get clusterrolebindings kubectl get clusterroles Verify the resource utilization of the framework. PowerShell kubectl -n capsule-system get pods kubectl top pod <<pod name>> -n capsule-system --containers The Capsule framework creates one pod replica. The CPU (cores) should be around 3m and Memory (bytes) around 26Mi. Verify the tenants available by running the below command as Cluster admin. The result should be “No Resources Found.” PowerShell kubectl get tenants Summary In this part, we have understood what multi-tenancy is, different types of tenant isolation models, challenges with Kubernetes native services, and installing the Capsule framework on AWS EKS. In the next part, we will further deep-dive into creating tenants and policy management.
Generative AI development has been democratized, thanks to powerful Machine Learning models (specifically Large Language Models such as Claude, Meta's LLama 2, etc.) being exposed by managed platforms/services as API calls. This frees developers from the infrastructure concerns and lets them focus on the core business problems. This also means that developers are free to use the programming language best suited for their solution. Python has typically been the go-to language when it comes to AI/ML solutions, but there is more flexibility in this area. In this post, you will see how to leverage the Go programming language to use Vector Databases and techniques such as Retrieval Augmented Generation (RAG) with langchaingo. If you are a Go developer who wants to how to build and learn generative AI applications, you are in the right place! If you are looking for introductory content on using Go for AI/ML, feel free to check out my previous blogs and open-source projects in this space. First, let's take a step back and get some context before diving into the hands-on part of this post. The Limitations of LLMs Large Language Models (LLMs) and other foundation models have been trained on a large corpus of data enabling them to perform well at many natural language processing (NLP) tasks. But one of the most important limitations is that most foundation models and LLMs use a static dataset which often has a specific knowledge cut-off (say, January 2022). For example, if you were to ask about an event that took place after the cut-off, date it would either fail to answer it (which is fine) or worse, confidently reply with an incorrect response — this is often referred to as Hallucination. We need to consider the fact that LLMs only respond based on the data they were trained on - it limits their ability to accurately answer questions on topics that are either specialized or proprietary. For instance, if I were to ask a question about a specific AWS service, the LLM may (or may not) be able to come up with an accurate response. Wouldn't it be nice if the LLM could use the official AWS service documentation as a reference? RAG (Retrieval Augmented Generation) Helps Alleviate These Issues It enhances LLMs by dynamically retrieving external information during the response generation process, thereby expanding the model's knowledge base beyond its original training data. RAG-based solutions incorporate a vector store which can be indexed and queried to retrieve the most recent and relevant information, thereby extending the LLM's knowledge beyond its training cut-off. When an LLM equipped with RAG needs to generate a response, it first queries a vector store to find relevant, up-to-date information related to the query. This process ensures that the model's outputs are not just based on its pre-existing knowledge but are augmented with the latest information, thereby improving the accuracy and relevance of its responses. But, RAG Is Not the Only Way Although this post focuses solely on RAG, there are other ways to work around this problem, each with its pros and cons: Task-specific tuning: Fine-tuning large language models on specific tasks or datasets to improve their performance in those domains. Prompt engineering: Carefully designing input prompts to guide language models towards desired outputs, without requiring significant architectural changes. Few-shot and zero-shot learning: Techniques that enable language models to adapt to new tasks with limited or no additional training data. Vector Store and Embeddings I mentioned vector store a few times in the last paragraph. These are nothing but databases that store and index vector embeddings, which are numerical representations of data such as text, images, or entities. Embeddings help us go beyond basic search since they represent the semantic meaning of the source data — hence the word Semantic search, which is a technique that understands the meaning and context of words to improve search accuracy and relevance. Vector databases can also store metadata, including references to the original data source (for example, the URL of a web document) of the embedding. Thanks to generative AI technologies, there has also been an explosion in Vector Databases. These include established SQL and NoSQL databases that you may already be using in other parts of your architecture — such as PostgreSQL, Redis, MongoDB, and OpenSearch. But there are also databases that are custom-built for vector storage. Some of these include Pinecone, Milvus, Weaviate, etc. Alright, let's go back to RAG... What Does a Typical RAG Workflow Look Like? At a high level, RAG-based solutions have the following workflow. These are often executed as a cohesive pipeline: Retrieving data from a variety of external sources like documents, images, web URLs, databases, proprietary data sources, etc. This consists of sub-steps such as chunking which involves splitting up large datasets (e.g. a 100 MB PDF file) into smaller parts (for indexing). Create embeddings: This involves using an embedding model to convert data into numerical representations. Store/Index embeddings in a vector store Ultimately, this is integration as part of a larger application where the contextual data (semantic search result) is provided to LLMs (along with the prompts). End-To-End RAG Workflow in Action Each of the workflow steps can be executed with different components. The ones used in the blog include: PostgreSQL: It will be used as a Vector Database, thanks to the pgvector extension. To keep things simple, we will run it in Docker. langchaingo: It is a Go port of the langchain framework. It provides plugins for various components, including vector store. We will use it for loading data from web URLs and indexing it in PostgreSQL. Text and embedding models: We will use Amazon Bedrock Claude and Titan models (for text and embedding respectively) with langchaingo. Retrieval and app integration: langchaingo vector store (for semantic search) and chain (for RAG). You will get a sense of how these individual pieces work. We will cover other variants of this architecture in subsequent blogs. Before You Begin Make sure you have: Go, Docker and psql (for e.g., using Homebrew if you're on Mac) installed. Amazon Bedrock access configured from your local machine - Refer to this blog post for details. Start PostgreSQL on Docker There is a Docker image we can use! docker run --name pgvector --rm -it -p 5432:5432 -e POSTGRES_USER=postgres -e POSTGRES_PASSWORD=postgres ankane/pgvector Activate pgvector extension by logging into PostgreSQL (using psql) from a different terminal: # enter postgres when prompted for password psql -h localhost -U postgres -W CREATE EXTENSION IF NOT EXISTS vector; Load Data Into PostgreSQL (Vector Store) Clone the project repository: git clone https://github.com/build-on-aws/rag-golang-postgresql-langchain cd rag-golang-postgresql-langchain At this point, I am assuming that your local machine is configured to work with Amazon Bedrock The first thing we will do is load data into PostgreSQL. In this case, we will use an existing web page as the source of information. I have used this developer guide — but feel free to use your own! Make sure to change the search query accordingly in the subsequent steps. export PG_HOST=localhost export PG_USER=postgres export PG_PASSWORD=postgres export PG_DB=postgres go run *.go -action=load -source=https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-general-nosql-design.html You should get the following output: loading data from https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-general-nosql-design.html vector store ready - postgres://postgres:postgres@localhost:5432/postgres?sslmode=disable no. of documents to be loaded 23 Give it a few seconds. Finally, you should see this output if all goes well: data successfully loaded into vector store To verify, go back to the psql terminal and check the tables: \d You should see a couple of tables — langchain_pg_collection and langchain_pg_embedding. These are created by langchaingo since we did not specify them explicitly (that's ok, it's convenient for getting started!). langchain_pg_collection contains the collection name while langchain_pg_embedding stores the actual embeddings. | Schema | Name | Type | Owner | |--------|-------------------------|-------|----------| | public | langchain_pg_collection | table | postgres | | public | langchain_pg_embedding | table | postgres | You can introspect the tables: select * from langchain_pg_collection; select count(*) from langchain_pg_embedding; select collection_id, document, uuid from langchain_pg_embedding LIMIT 1; You will see 23 rows in the langchain_pg_embedding table, since that was the number of langchain documents that our web page source was split into (refer to the application logs above when you loaded the data) A quick detour into how this works... The data loading implementation is in load.go, but let's look at how we access the vector store instance (in common.go): brc := bedrockruntime.NewFromConfig(cfg) embeddingModel, err := bedrock.NewBedrock(bedrock.WithClient(brc), bedrock.WithModel(bedrock.ModelTitanEmbedG1)) //... store, err = pgvector.New( context.Background(), pgvector.WithConnectionURL(pgConnURL), pgvector.WithEmbedder(embeddingModel), ) pgvector.WithConnectionURL is where the connection information for PostgreSQL instance is provided pgvector.WithEmbedder is the interesting part, since this is where we can plug in the embedding model of our choice. langchaingo supports Amazon Bedrock embeddings. In this case I have used Amazon Bedrock Titan embedding model. Back to the loading process in load.go. We first get the data in form of a slice of schema.Document (getDocs function) using the langchaingo in-built HTML loader for this. docs, err := documentloaders.NewHTML(resp.Body).LoadAndSplit(context.Background(), textsplitter.NewRecursiveCharacter()) Then, we load it into PostgreSQL. Instead of writing everything by ourselves, we can use the langchaingo vector store abstraction and use the high-level function AddDocuments: _, err = store.AddDocuments(context.Background(), docs) Great. We have set up a simple pipeline to fetch and ingest data into PostgreSQL. Let's make use of it! Execute Semantic Search Let's ask a question. I am going with "What tools can I use to design dynamodb data models?" relevant to this document which I used as the data source — feel free to tune it as per your scenario. export PG_HOST=localhost export PG_USER=postgres export PG_PASSWORD=postgres export PG_DB=postgres go run *.go -action=semantic_search -query="what tools can I use to design dynamodb data models?" -maxResults=3 You should see a similar output — note that we opted to output a maximum of three results (you can change it): vector store ready ============== similarity search results ============== similarity search info - can build new data models from, or design models based on, existing data models that satisfy your application's data access patterns. You can also import and export the designed data model at the end of the process. For more information, see Building data models with NoSQL Workbench similarity search score - 0.3141409 ============================ similarity search info - NoSQL Workbench for DynamoDB is a cross-platform, client-side GUI application that you can use for modern database development and operations. It's available for Windows, macOS, and Linux. NoSQL Workbench is a visual development tool that provides data modeling, data visualization, sample data generation, and query development features to help you design, create, query, and manage DynamoDB tables. With NoSQL Workbench for DynamoDB, you similarity search score - 0.3186116 ============================ similarity search info - key-value pairs or document storage. When you switch from a relational database management system to a NoSQL database system like DynamoDB, it's important to understand the key differences and specific design approaches.TopicsDifferences between relational data design and NoSQLTwo key concepts for NoSQL designApproaching NoSQL designNoSQL Workbench for DynamoDB Differences between relational data design and NoSQL similarity search score - 0.3275382 ============================ Now what you see here are the top three results (thanks to -maxResults=3). Note that this is not an answer to our question. These are the results from our vector store that are semantically close to the query — the keyword here is semantic. Thanks to the vector store abstraction in langchaingo, we were able to easily ingest our source data into PostgreSQL and use the SimilaritySearch function to get the top N results corresponding to our query (see semanticSearch function in query.go): Note that (at the time of writing) the pgvector implementation in langchaingo uses cosine distance vector operation but pgvector also supports L2 and inner product - for details, refer to the pgvector documentation. Ok, so far we have: Loaded vector data Executed semantic search This is the stepping stone to RAG (Retrieval Augmented Generation) - let's see it in action! Intelligent Search With RAG To execute a RAG-based search, we run the same command as above (almost), only with a slight change in the action (rag_search): export PG_HOST=localhost export PG_USER=postgres export PG_PASSWORD=postgres export PG_DB=postgres go run *.go -action=rag_search -query="what tools can I use to design dynamodb data models?" -maxResults=3 Here is the output I got (might be slightly different in your case): Based on the context provided, the NoSQL Workbench for DynamoDB is a tool that can be used to design DynamoDB data models. Some key points about NoSQL Workbench for DynamoDB: - It is a cross-platform GUI application available for Windows, macOS, and Linux. - It provides data modeling capabilities to help design and create DynamoDB tables. - It allows you to build new data models or design models based on existing data models. - It provides features like data visualization, sample data generation, and query development to manage DynamoDB tables. - It helps in understanding the key differences and design approaches when moving from a relational database to a NoSQL database like DynamoDB. So in summary, NoSQL Workbench for DynamoDB seems to be a useful tool specifically designed for modeling and working with DynamoDB data models. As you can see, the result is not just about "Here are the top X responses for your query." Instead, it's a well-formulated response to the question. Let's peek behind the scenes again to see how it works. Unlike, ingestion and semantic search, RAG-based search is not directly exposed by the langchaingo vector store implementation. For this, we use a langchaingo chain which takes care of the following: Invokes semantic search Combines the semantic search with a prompt Sends it to a Large Language Model (LLM), which in this case happens to be Claude on Amazon Bedrock. Here is what the chain looks like (refer to the function ragSearch in query.go): result, err := chains.Run( context.Background(), chains.NewRetrievalQAFromLLM( llm, vectorstores.ToRetriever(store, numOfResults), ), question, chains.WithMaxTokens(2048), ) Let’s Try Another One This was just one example. I tried a different question and increased maxResults to 10, which means that the top 10 results from the vector database will be used to formulate the answer. go run *.go -action=rag_search -query="how is NoSQL different from SQL?" -maxResults=10 The result (again, it might be different for you): Based on the provided context, there are a few key differences between NoSQL databases like DynamoDB and relational database management systems (RDBMS): 1. Data Modeling: - In RDBMS, data modeling is focused on flexibility and normalization without worrying much about performance implications. Query optimization doesn't significantly affect schema design. - In NoSQL, data modeling is driven by the specific queries and access patterns required by the application. The data schema is designed to optimize the most common and important queries for speed and scalability. 2. Data Organization: - RDBMS organizes data into tables with rows and columns, allowing flexible querying. - NoSQL databases like DynamoDB use key-value pairs or document storage, where data is organized in a way that matches the queried data shape, improving query performance. 3. Query Patterns: - In RDBMS, data can be queried flexibly, but queries can be relatively expensive and don't scale well for high-traffic situations. - In NoSQL, data can be queried efficiently in a limited number of ways defined by the data model, while other queries may be expensive and slow. 4. Data Distribution: - NoSQL databases like DynamoDB distribute data across partitions to scale horizontally, and the data keys are designed to evenly distribute the traffic across partitions, avoiding hot spots. - The concept of "locality of reference," keeping related data together, is crucial for improving performance and reducing costs in NoSQL databases. In summary, NoSQL databases prioritize specific query patterns and scalability over flexible querying, and the data modeling is tailored to these requirements, in contrast with RDBMS where data modeling focuses on normalization and flexibility. Where to “Go” From Here? Learning by doing is a good approach. If you've followed along and executed the application thus far, great! I recommend you try out the following: langchaingo has support for lots of different models, including ones in Amazon Bedrock (e.g. Meta LLama 2, Cohere, etc.) — try tweaking the model and see if it makes a difference. Is the output better? What about the Vector Database? I demonstrated PostgreSQL, but langchaingo supports others as well (including OpenSearch, Chroma, etc.) - Try swapping out the Vector store and see how/if the search results differ. You probably get the gist, but you can also try out different embedding models. We used Amazon Titan, but langchaingo also supports many others, including Cohere embed models in Amazon Bedrock. Wrap Up This was a simple example for you to better understand the individual steps in building RAG-based solutions. These might change a bit depending on the implementation, but the high-level ideas remain the same. I used langchaingo as the framework. But this doesn't always mean you have to use one. You could also remove the abstractions and call the LLM platforms APIs directly if you need granular control in your applications or the framework does not meet your requirements. Like most generative AI, this area is rapidly evolving, and I am optimistic about having Go developers have more options to build generative AI solutions. If you've feedback or questions, or you would like me to cover something else around this topic, feel free to comment below! Happy building!
As a Linux administrator or even if you are a newbie who just started using Linux, having a good understanding of useful commands in troubleshooting network issues is paramount. We'll explore the top 10 essential Linux commands for diagnosing and resolving common network problems. Each command will be accompanied by real-world examples to illustrate its usage and effectiveness. 1. ping Example: ping google.com Shell test@ubuntu-server ~ % ping google.com -c 5 PING google.com (142.250.189.206): 56 data bytes 64 bytes from 142.250.189.206: icmp_seq=0 ttl=58 time=14.610 ms 64 bytes from 142.250.189.206: icmp_seq=1 ttl=58 time=18.005 ms 64 bytes from 142.250.189.206: icmp_seq=2 ttl=58 time=19.402 ms 64 bytes from 142.250.189.206: icmp_seq=3 ttl=58 time=22.450 ms 64 bytes from 142.250.189.206: icmp_seq=4 ttl=58 time=15.870 ms --- google.com ping statistics --- 5 packets transmitted, 5 packets received, 0.0% packet loss round-trip min/avg/max/stddev = 14.610/18.067/22.450/2.749 ms test@ubuntu-server ~ % Explanation ping uses ICMP protocol, where ICMP stands for internet control message protocol and ICMP is a network layer protocol used by network devices to communicate. ping helps in testing the reachability of the host and it will also help in finding the latency between the source and destination. 2. traceroute Example: traceroute google.com Shell test@ubuntu-server ~ % traceroute google.com traceroute to google.com (142.250.189.238), 64 hops max, 52 byte packets 1 10.0.0.1 (10.0.0.1) 6.482 ms 3.309 ms 3.685 ms 2 96.120.90.197 (96.120.90.197) 13.094 ms 10.617 ms 11.351 ms 3 po-301-1221-rur01.fremont.ca.sfba.comcast.net (68.86.248.153) 12.627 ms 11.240 ms 12.020 ms 4 ae-236-rar01.santaclara.ca.sfba.comcast.net (162.151.87.245) 18.902 ms 44.432 ms 18.269 ms 5 be-299-ar01.santaclara.ca.sfba.comcast.net (68.86.143.93) 14.826 ms 13.161 ms 12.814 ms 6 69.241.75.42 (69.241.75.42) 12.236 ms 12.302 ms 69.241.75.46 (69.241.75.46) 15.215 ms 7 * * * 8 142.251.65.166 (142.251.65.166) 21.878 ms 14.087 ms 209.85.243.112 (209.85.243.112) 14.252 ms 9 nuq04s39-in-f14.1e100.net (142.250.189.238) 13.666 ms 192.178.87.152 (192.178.87.152) 12.657 ms 13.170 ms test@ubuntu-server ~ % Explanation Traceroute shows the route packets take to reach a destination host. It displays the IP addresses of routers along the path and calculates the round-trip time (RTT) for each hop. Traceroute helps identify network congestion or routing issues. 3. netstat Example: netstat -tulpn Shell test@ubuntu-server ~ % netstat -tuln Active LOCAL (UNIX) domain sockets Address Type Recv-Q Send-Q Inode Conn Refs Nextref Addr aaf06ba76e4d0469 stream 0 0 0 aaf06ba76e4d03a1 0 0 /var/run/mDNSResponder aaf06ba76e4d03a1 stream 0 0 0 aaf06ba76e4d0469 0 0 aaf06ba76e4cd4c1 stream 0 0 0 aaf06ba76e4ccdb9 0 0 /var/run/mDNSResponder aaf06ba76e4cace9 stream 0 0 0 aaf06ba76e4c9e11 0 0 /var/run/mDNSResponder aaf06ba76e4d0b71 stream 0 0 0 aaf06ba76e4d0aa9 0 0 /var/run/mDNSResponder test@ubuntu-server ~ % Explanation Netstat displays network connections, routing tables, interface statistics, masquerade connections, and multicast memberships. It's useful for troubleshooting network connectivity, identifying open ports, and monitoring network performance. 4. ifconfig/ip Example: ifconfig or ifconfig <interface name> Shell test@ubuntu-server ~ % ifconfig en0 en0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500 options=6460<TSO4,TSO6,CHANNEL_IO,PARTIAL_CSUM,ZEROINVERT_CSUM> ether 10:9f:41:ad:91:60 inet 10.0.0.24 netmask 0xffffff00 broadcast 10.0.0.255 inet6 fe80::870:c909:df17:7ed1%en0 prefixlen 64 secured scopeid 0xc inet6 2601:641:300:e710:14ef:e605:4c8d:7e09 prefixlen 64 autoconf secured inet6 2601:641:300:e710:d5ec:a0a0:cdbb:79a7 prefixlen 64 autoconf temporary inet6 2601:641:300:e710::6cfc prefixlen 64 dynamic nd6 options=201<PERFORMNUD,DAD> media: autoselect status: active test@ubuntu-server ~ % Explanation ifconfig and ip commands are used to view and configure network parameters. They provide information about the IP address, subnet mask, MAC address, and network status of each interface. 5. tcpdump Example:tcpdump -i en0 tcp port 80 Shell test@ubuntu-server ~ % tcpdump -i en0 tcp port 80 tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on en0, link-type EN10MB (Ethernet), snapshot length 524288 bytes 0 packets captured 55 packets received by filter 0 packets dropped by kernel test@ubuntu-server ~ % Explanation Tcpdump is a packet analyzer that captures and displays network traffic in real-time. It's invaluable for troubleshooting network issues, analyzing packet contents, and identifying abnormal network behavior. Use tcpdump to inspect packets on specific interfaces or ports. 6. nslookup/dig Example: nslookup google.com or dig Shell test@ubuntu-server ~ % nslookup google.com Server: 2001:558:feed::1 Address: 2001:558:feed::1#53 Non-authoritative answer: Name: google.com Address: 172.217.12.110 test@ubuntu-server ~ % test@ubuntu-server ~ % dig google.com ; <<>> DiG 9.10.6 <<>> google.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 46600 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 512 ;; QUESTION SECTION: ;google.com. IN A ;; ANSWER SECTION: google.com. 164 IN A 142.250.189.206 ;; Query time: 20 msec ;; SERVER: 2001:558:feed::1#53(2001:558:feed::1) ;; WHEN: Mon Apr 15 22:55:35 PDT 2024 ;; MSG SIZE rcvd: 55 test@ubuntu-server ~ % Explanation nslookup and dig are DNS lookup tools used to query DNS servers for domain name resolution. They provide information about the IP address associated with a domain name and help diagnose DNS-related problems such as incorrect DNS configuration or server unavailability. 7. iptables/firewalld Example: iptables -L or firewall-cmd --list-all Shell test@ubuntu-server ~# iptables -L Chain INPUT (policy ACCEPT) target prot opt source destination Chain FORWARD (policy DROP) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination test@ubuntu-server ~# Explanation iptables and firewalld are firewall management tools used to configure packet filtering and network address translation (NAT) rules. They control incoming and outgoing traffic and protect the system from unauthorized access. Use them to diagnose firewall-related issues and ensure proper traffic flow. 8. ss Example: ss -tulpn Shell test@ubuntu-server ~# Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port udp UNCONN 0 0 *:161 *:* udp UNCONN 0 0 *:161 *:* test@ubuntu-server ~# Explanation ss is a utility to investigate sockets. It displays information about TCP, UDP, and UNIX domain sockets, including listening and established connections, connection state, and process IDs. ss is useful for troubleshooting socket-related problems and monitoring network activity. 9. arp Example: arp -a Shell test@ubuntu-server ~ % arp -a ? (10.0.0.1) at 80:da:c2:95:aa:f7 on en0 ifscope [ethernet] ? (10.0.0.57) at 1c:4d:66:bb:49:a on en0 ifscope [ethernet] ? (10.0.0.83) at 3a:4a:df:fe:66:58 on en0 ifscope [ethernet] ? (10.0.0.117) at 70:2a:d5:5a:cc:14 on en0 ifscope [ethernet] ? (10.0.0.127) at fe:e2:1c:4d:b3:f7 on en0 ifscope [ethernet] ? (10.0.0.132) at bc:d0:74:9a:51:85 on en0 ifscope [ethernet] ? (10.0.0.255) at ff:ff:ff:ff:ff:ff on en0 ifscope [ethernet] mdns.mcast.net (224.0.0.251) at 1:0:5e:0:0:fb on en0 ifscope permanent [ethernet] ? (239.255.255.250) at 1:0:5e:7f:ff:fa on en0 ifscope permanent [ethernet] test@ubuntu-server ~ % Explanation arp (Address Resolution Protocol) displays and modifies the IP-to-MAC address translation tables used by the kernel. It resolves IP addresses to MAC addresses and vice versa. arp is helpful for troubleshooting issues related to network device discovery and address resolution. 10. mtr Example: mtr Shell test.ubuntu.com (0.0.0.0) Tue Apr 16 14:46:40 2024 Keys: Help Display mode Restart statistics Order of fields quit Packets Ping Host Loss% Snt Last Avg Best Wrst StDev 1. 10.0.0.10 0.0% 143 0.8 9.4 0.7 58.6 15.2 2. 10.0.2.10 0.0% 143 0.8 9.4 0.7 58.6 15.2 3. 192.168.0.233 0.0% 143 0.8 9.4 0.7 58.6 15.2 4. 142.251.225.178 0.0% 143 0.8 9.4 0.7 58.6 15.2 5. 142.251.225.177 0.0% 143 0.8 9.4 0.7 58.6 15.2 Explanation mtr (My traceroute) combines the functionality of ping and traceroute into a single diagnostic tool. It continuously probes network paths between the host and a destination, displaying detailed statistics about packet loss, latency, and route changes. Mtr is ideal for diagnosing intermittent network problems and monitoring network performance over time. Mastering these commands comes in handy for troubleshooting network issues on Linux hosts.
Editor's Note: The following is an article written for and published in DZone's 2024 Trend Report, Modern API Management: Connecting Data-Driven Architectures Alongside AI, Automation, and Microservices. A recent conversation with a fellow staff engineer at a Top 20 technology company revealed that their underlying infrastructure is self-managed and does not leverage cloud-native infrastructure offered by major providers like Amazon, Google, or Microsoft. Hearing this information took me a minute to comprehend given how this conflicts with my core focus on leveraging frameworks, products, and services for everything that doesn't impact intellectual property value. While I understand the pride of a Top 20 technology company not wanting to contribute to the success of another leading technology company, I began to wonder just how successful they could be if they utilized a cloud-native approach. That also made me wonder how many other companies have yet to adopt a cloud-native approach… and the impact it is having on their APIs. Why Cloud? Why Now? For the last 10 years, I have been focused on delivering cloud-native API services for my projects. While cloud adoption continues to gain momentum, a decent percentage of corporations and technology providers still utilize traditional on-premises designs. According to The Cloud in 2021: Adoption Continues report by O'Reilly Media, Figure 1 provides a summary of the state of cloud adoption in December 2021. Figure 1. Cloud technology usage Image adapted from The Cloud in 2021: Adoption Continues, O'Reilly Media Since the total percentages noted in Figure 1 exceed 100%, the underlying assumption is that it is common for respondents to maintain both a cloud and on-premises design. However, for those who are late to enter the cloud native game, I wanted to touch on some common benefits that are recognized with cloud adoption: Focus on delivering or enhancing laser-focused APIs — stop worrying about and managing on-premises infrastructure. Scale your APIs up (and down) as needed to match demand — this is a primary use case for cloud adoption. Reduce risk by expanding your API presence — leverage availability zones, regions, and countries. Describe the supporting API infrastructure as code (IaC) — faster recovery and expandability into new target locations. Making the transition toward cloud native has become easier than ever, with the major providers offering free or discounted trial periods. Additionally, smaller platform-as-a-service (PaaS) providers like Heroku and Render provide solutions that allow teams to focus on their products and services and not worry about the underlying infrastructure design. The Cloud Native Impact on Your API Since this Trend Report is focused on modern API management, I wanted to focus on a few of the benefits that cloud native can have on APIs. Availability and Latency Objectives When providing APIs for your consumers to consume, the concept of service-level agreements (SLAs) is a common onboarding discussion topic. This is basically where expectations are put into easy-to-understand wording that becomes a binding contract between the API provider and the consumer. Failure to meet these expectations can result in fees and, in some cases, legal action. API service providers often take things a step further by establishing service-level objectives (SLOs) that are even more stringent. The goal here is to establish monitors and alerts to remediate issues before they breach contractual SLAs. But what happens when the SLOs and SLAs struggle to be met? This is where the primary cloud native use case can assist. If the increase in latency is due to hardware limitations, the service can be scaled up vertically (by increasing the hardware) or horizontally (by adding more instances). If the increase in latency is driven by geographical location, introducing service instances in closer regions is something cloud native providers can provide to remedy this scenario. API Management As your API infrastructure expands, a cloud-native design provides the necessary tooling to ease supportability and manageability efforts. From an infrastructure perspective, the underlying definition of the service is defined using an IaC approach, allowing the service itself to become defined in a single location. As updates are made to that base design, those changes can be rolled out to each target service instance, avoiding any drift between service instances. From an API management perspective, cloud native providers include the necessary tooling to manage the APIs from a usage perspective. Here, API keys can be established, which offer the ability to impose thresholds on requests that can be made or features that align with service subscription levels. Cloud Native !== Utopia While APIs flourish in cloud native implementations, it is important to recognize that a cloud-native approach is not without its own set of challenges. Cloud Cost Management CloudZero's The State Of Cloud Cost Intelligence 2022 report concluded that only 40% of respondents indicated that their cloud costs were at an expected level as noted in Figure 2. Figure 2. Cloud native cost realities Image adapted from The State Of Cloud Cost Intelligence, CloudZero This means that 60% of respondents are dealing with higher-than-expected cloud costs, which ultimately impact an organization's ability to meet planned objectives. Cloud native spending can often be remediated by adopting the following strategies: Require team-based tags or cloud accounts to help understand levels of spending at a finer grain. Focus on storage buckets and database backups to understand if the cost is in line with the value. Engage a cloud business partner that specializes in cloud spending analysis. Account Takeover The concept of accounts becoming "hacked" is prevalent in social media. At times, I feel like my social media feed contains more "my account was hacked" messages than the casual updates I was tuning in to read. Believe it or not, the concept of account takeover is becoming a common fear for cloud native adopters. Imagine starting your day only to realize you no longer have access to any of your cloud-native services. Soon thereafter, your customers begin to flood your support lines to ask what is going on… and where the data they were expecting to see with each API call is. Another potential consequence is that the APIs are shut down completely, forcing customers to seek out competing APIs. Remember, your account protection is only as strong as your weakest link. Make sure to employ everything possible to protect your account and move away from simple username + password account protection. Disaster Recovery It is also important to recognize that cloud native is not a replacement for maintaining a strong disaster recovery posture. Understand the impact of availability zone and region-wide outages — both are expected to happen. Plan to implement immutable backups — avoid relying on traditional backups and snapshots. Leverage IaC to establish all aspects of cloud native — and test it often. Alternative Flows Exist While a cloud-native approach provides an excellent landscape to help your business and partnerships be successful, there are likely use cases that present themselves as alternative flows for cloud native adoption: Regulatory requirements for a given service can often present themselves as an alternative flow and not be a candidate for cloud native adoption. Point of presence requirements can also become a blocker for cloud native adoption when the closest cloud-native location is not close enough to meet the established SLAs and SLOs. On the Other Side of API Cloud Adoption By adopting a cloud-native approach, it is possible to extend an API across multiple availability zones and geographical regions within a given point of presence. Figure 3. Multi-region cloud native adoption In Figure 3, each region contains an API service instance in three different geographical regions. Additionally, each region contains an API service instance running in three different availability zones — each with its own network and power source. In this example, there are nine distinct instances running across the United States. By introducing a global common name, consumers always receive a service response from the least-latent and available service instance. This approach easily allows for entire regions to be taken offline for disaster recovery validation without any interruptions of service at the consumer level. Conclusion Readers familiar with my work may recall that I have been focused on the following mission statement, which I feel can apply to any IT professional: Focus your time on delivering features/functionality that extend the value of your intellectual property. Leverage frameworks, products, and services for everything else. —John Vester When I think about my conversion with the staff engineer at the Top 20 tech company, I wonder how much more successful his team would be without having to worry about the underlying infrastructure being managed with their on-premises approach. While the other side of cloud native is not without challenges, it does adhere to my mission statement. As a result, projects that I have worked on for the last 10 years have been able to remain focused on meeting the needs of API consumers while staying in line with corporate objectives. From an API perspective, cloud native offers additional ways to adhere to my personal mission statement by describing everything related to the service using IaC and leveraging built-in tooling to manage the APIs across different availability zones and regions. Have a really great day! This is an excerpt from DZone's 2024 Trend Report, Modern API Management: Connecting Data-Driven Architectures Alongside AI, Automation, and Microservices.Read the Free Report
Editor's Note: The following is an article written for and published in DZone's 2024 Trend Report, Modern API Management: Connecting Data-Driven Architectures Alongside AI, Automation, and Microservices. In the dynamic and rapidly evolving landscape of cloud-native and SaaS-driven software development, the process of API generation plays a pivotal role in accelerating time to market (TTM) and competitive advantage by rapidly creating APIs. This crucial function now converges with low- and no-code platforms, ushering in new ways of streamlined development processes. At the forefront of this evolution stands generative AI (GenAI), which offers unprecedented speed and flexibility in API creation. In this article, we embark on a journey to explore the transformative potential of generative AI and low- and no-code platforms in API generation by highlighting their roles in fostering innovation and expediting time to market. Additionally, we delve into strategies for effective implementation, compliance considerations, enterprise patterns, and the future trajectory of no-code API generation. Refer to Figure 1 for an illustration of the key components and architecture involved in this synergy: Generative AI – These algorithms autonomously generate code snippets by analyzing large datasets. Low-code/no-code platform – This encompasses the visual interface and pre-built components enabling API design without manual coding. API generation components – These leverage GenAI capabilities to automate API code generation based on user-defined specifications and prompts. API output – The final generated API code that is ready for deployment and integration into software applications. Generative vs. Conventional APIs in the Modern Software Landscape The debate between generative and conventional APIs has intensified, particularly concerning their impact on time to market and skill requirements. Generative APIs — powered by advanced technologies like artificial intelligence (AI) on top of low- and no-code platforms — have gained traction for their ability to automate the API generation process, promising faster TTM and reducing the skill threshold required for development. Conversely, conventional APIs — built through manual coding processes — typically demand a higher level of technical expertise and entail longer development cycles. In contrast, conventional APIs offer flexibility, tailored customization, and, hence, better capabilities for handling complexity. Table 1. Generative vs. conventional APIs Aspect Generative APIs Conventional APIs Time to market Rapid development cycles due to automation Lengthy development timelines Skill requirements Lower skill threshold; accessible to non-coders Require proficient coding skills Handling complex logic May struggle with replicating complex logic accurately Excel in handling complex logic and intricate business rules Integration complexity Simplified integration with existing systems May be challenging and time consuming Maintenance effort Reduced due to automation Require ongoing manual updates and maintenance Innovation potential Enable rapid experimentation and innovation May be limited due to manual coding and testing processes Error handling Automated but may be less customizable Fully customizable Legacy system integration May require additional effort Better compatibility and integration capabilities Creative freedom Limited due to automated processes Flexibility for developers Table 1 highlights the trade-offs between the two approaches, emphasizing the potential to revolutionize API development by expediting TTM and democratizing access to software development. Moreover, by lowering the skill threshold required for API development, these tools empower a broader range of individuals to contribute to the creation of APIs while developers can focus on higher-level tasks. The Synergy of Generative AI and Low and No Code for API Generation The integration of GenAI with low and no code for API generation signifies a significant advancement in API development/generation practices, particularly in terms of efficiency and accessibility. GenAI employs sophisticated large language model (LLM) algorithms that learn to autonomously analyze large datasets on enterprise (an organization's internal context) and open source patterns. When combined with low- and no-code platforms, GenAI enhances these tools by providing developers AI-generated code components that can be easily incorporated into their applications. This integration streamlines the development process, allowing developers to leverage pre-built AI-generated modules to accelerate prototyping and customization. Overall, the integration of GenAI with low- and no-code platforms revolutionizes API development, empowering organizations to innovate rapidly and deliver high-quality solutions to market with unprecedented speed and efficiency. Additionally, this amalgamation profoundly impacts the daily operations and responsibilities of engineers and developers, empowering them to innovate, streamline workflows, and adapt to the evolving demands of modern development practices. Figure 1. Architectural overview of GenAI with low- and no-code platforms for automated API generation Charting the Future: The Trajectory of No-Code API Generation As businesses prioritize agility and efficiency, the rising adoption of automated no-code API generation is set to transform development processes, therefore streamlining workflows and expediting TTM. With no-code platforms now able to accurately generate complex APIs by leveraging GenAI, this trajectory foresees a transition to more intuitive and potent development tools, which, in turn, will empower organizations to innovate swiftly and deliver top-notch solutions with unparalleled speed and efficiency. Democratizing Development As organizations adopt these innovative solutions, the democratization of development gains momentum. No-code API generation platforms empower individuals with diverse technical backgrounds to engage in API creation, promoting collaboration and inclusivity. By lowering entry barriers and reducing reliance on traditional coding skills, these platforms foster a future where API development is accessible and collaborative. This democratization accelerates innovation and ensures that a broader range of voices and perspectives contribute to the development process, resulting in more inclusive and impactful solutions. Compliance and Security In the domain of automated no-code API generation, organizations must prioritize compliance and security to uphold the integrity and reliability of their APIs. Compliance entails adhering to regulatory requirements such as GDPR, HIPAA, and PCI-DSS, as well as industry-specific benchmarks like ISO 27001 or SOC 2. Additionally, robust security measures — including authentication and encryption protocols — are essential for safeguarding against unauthorized access, secret rotations, and cyber threats. Prioritizing compliance and security enables organizations to mitigate risks, protect data, and uphold stakeholder trust. Business Transformation No-code API generation empowers businesses to swiftly adapt to market changes by facilitating rapid API creation and deployment without extensive coding. These tools and platforms enable developers to iterate quickly on ideas, prototypes, and solutions, expediting the development cycle and enabling organizations to respond promptly to market demands. By automating manual coding tasks and streamlining development processes, these tools free up developers' time and resources, allowing them to focus on more strategic tasks such as innovation, problem solving, and optimization. Additionally, these platforms democratize the development process by facilitating collaboration among cross-functional teams with varying levels of technical expertise, fostering scalability and competitiveness. Embracing no-code API generation is essential for driving meaningful transformation and maintaining a competitive edge in today's dynamic digital landscape. Challenges of Automated API Generation While these tools offer streamlined development processes, they come with hurdles that must be addressed. The following chart highlights key technical challenges, providing insights for effective implementation. Table 2. Challenges of automated API generation Challenge Description Handling complex data structures and business logic Handling complex data structures and business logic may pose challenges in integrating diverse data, validating data, and handling versioning effectively. Integration dependencies Automated tools may struggle to integrate seamlessly with existing systems and external APIs due to data format disparities, API versioning conflicts, and limited customization options. Security vulnerabilities Automated processes of low- and no-code results may be prone to potential vulnerabilities such as inadequate access controls, insecure configurations, and absence of data encryption, as well as vulnerabilities within third-party integrations. Limited flexibility Limited flexibility arises from predefined templates or patterns, constraining customization and adaptability to specific project requirements, potentially impacting functionality and scalability. Future proofing Low- and no-code API platforms are abstract in nature and can result in vendor lock-in; adaptability to evolving technologies as well as ensuring long-term support and compatibility can be challenging. Strategies and Guidelines for Generative No-Code API Development Strategies and guidelines provide a roadmap for leveraging AI-driven tools and low- and no-code platforms effectively. These strategies and guidelines encompass comprehensive planning, iterative development, and collaborative approaches, thus ensuring streamlined workflows and accelerated TTM while prioritizing automation, scalability, and security. Table 3. Developing generative no-code APIs Key Aspects Strategies and Guidelines Comprehensive planning Plan thoroughly by defining clear objectives and requirements up front to ensure alignment with business goals and user needs. Iterative development Adopt an iterative development approach, allowing for continuous feedback, testing, and refinement throughout the development process. Collaborative development Foster collaboration between technical and non-technical stakeholders, encouraging cross-functional teams to contribute to API design and development. Embrace automation Leverage automation tools and features provided by no- and low-code platforms to streamline development tasks and increase productivity. Ensure scalability Design APIs with scalability in mind, anticipating future growth and ensuring that the architecture can support increased demand and usage over time. Prioritize security Implement robust security measures to protect APIs from potential threats, including data breaches, unauthorized access, and injection attacks. Testing and validation Implement rigorous testing and validation processes to ensure the reliability, functionality, and interoperability of APIs across different platforms and environments. Conclusion As we cast our gaze toward the future shaped by cloud-native and SaaS-driven development, the integration of generative AI with low- and no-code platforms emerges as a catalyst for innovation. This symbiotic relationship not only revolutionizes API generation but also bestows developers with unprecedented flexibility and efficiency. Embracing automation and innovation will be pivotal in meeting the evolving market demands and expediting TTM. This trend represents more than just a leap in technological prowess; it signals a paradigm shift in the ethos of API development, where the context of creativity and efficiency converge harmoniously. Ultimately, developers and engineers are empowered by automated API generation tools, enabling them to rapidly translate ideas into prototypes and solutions, thus expediting the development cycle. This capability positions engineering and development teams to respond promptly to market demands and feature requirements, fostering experimentation and innovation. By automating manual coding tasks and streamlining development processes, these tools unlock opportunities for organizations to gain a competitive edge by delivering value to customers with unparalleled speed. Despite inevitable challenges, such as compliance and security considerations, the trajectory of automated API generation remains on a path of progress. Embracing strategic guidelines and proactively addressing challenges, businesses can harness the transformative potential of automated API generation to shape the future of software development and technology trends. This is an excerpt from DZone's 2024 Trend Report, Modern API Management: Connecting Data-Driven Architectures Alongside AI, Automation, and Microservices.Read the Free Report
Editor's Note: The following is an article written for and published in DZone's 2024 Trend Report, Modern API Management: Connecting Data-Driven Architectures Alongside AI, Automation, and Microservices. REST APIs have become the standard for communication between applications on the web. Based on simple yet powerful principles, REST APIs offer a standardized yet flexible approach to the design, development, and consumption of programming interfaces. By adopting a client-server architecture and making appropriate use of HTTP methods, REST APIs enable smooth, efficient integration of distributed systems. Becoming a standard, the API ecosystem has grown much richer in recent years and is increasingly integrated into the DevOps ecosystem. It has been infused with agility, CI/CD, and FinOps, and continues to develop by itself. In this article, we're going to compile these new practices and tools to give you a large overview of what an "API approach" can do. API Design and Documentation The API design and documentation stage is crucial, as it defines the basis for all subsequent development. For this reason, it is essential to use methodologies such as domain-driven design (DDD), event storming, and API goals canvas — which represents what, who, how, inputs, outputs, and goals — to understand business needs and identify relevant domains and the objectives of the APIs to be developed. These workshops enable businesses and dev teams to work together and define API objectives and interactions between different business domains. Figure 1. Designing APIs When designing and documenting APIs, it's essential to take into account the fundamental principles of REST APIs. This includes identifying resources and representing them using meaningful URLs, making appropriate use of HTTP methods for CRUD (Create, Read, Update, Delete) operations, and managing resource states in a stateless way. By adopting a resource-oriented approach, development teams can design REST APIs that are intuitive and easy to use for client developers. REST API documentation should highlight available endpoints, supported methods, accepted and returned data formats, and any security or pagination constraints. As such, the REST principles do not exclude the freedom to make a number of choices, such as the choice of naming conventions. For a compilation of these best practices, you can read my last Refcard, API Integration Patterns. In this stage, API style books play a crucial role. It gives design guidelines and standards to ensure the consistency and quality of the APIs developed. These style books define rules on aspects such as URI structure, HTTP methods to be used, data formats, error handling, and so on. They serve as a common reference for all teams working on API projects within the organization. Stoplight and SwaggerHub are commonly used, but a simple Wiki tool could be enough. Data model libraries complete this phase by providing reusable data models, which define the standard data structures used in APIs. Data model libraries include JSON schemas, database definitions, object models, and more. They facilitate development by providing ready-to-use assets, reducing errors, and speeding up development. Commonly used tools include Apicurio and Stoplight. A workflow API's description is often missing from the APIs we discover on API portals. Questions arise such as: How do I chain API calls? How do I describe the sequence of calls? With a drawing? With text in the API description? How do I make it readable and regularly updated by the person who knows the API best (i.e., the developer)? It could still be a pain to understand the sequence of API calls. However, this is often covered by the additional documentation that can be provided on an API portal. Yet at the same time, this was decoupled from the code supplied by the developers. The OpenAPI Specification allows you to define links and callbacks, but it is quickly limited to explaining things properly. This is why the OpenAPI Workflows Specification has recently appeared, allowing API workflows to be defined. In this specification, the steps are always described in JSON, which, in turn, allows a schema to be generated to explain the sequence of calls. If you want to describe your workflows from OpenAPI specifications, you can use Swagger Editor or SwaggerHub. And you can use Swagger to UML or openapi-to-plantuml. If you want to begin by designing sequence diagrams, you can use PlantUml or LucidChart, for instance. There is no unique toolchain that fits all needs; you have to first know if you prefer a top-down or bottom-up approach. Tools such as Stoplight Studio combined with Redocly are commonly known for handling these topics — Apicurio as well. They can be used for API design, enabling teams to easily create and visualize OpenAPI specifications using a user-friendly interface. These specifications can then be used to automatically generate interactive documentation, ensuring that documentation is always up to date and consistent with API specifications. API Development Once the API specifications have been defined, the next step is to develop the APIs following the guidelines and models established during the design phase. Agile software development methods, effective collaboration, and version management are must-have practices to ensure good and fast developments. Figure 2. Building APIs For versioning, teams use version control systems such as Git or GitHub to manage API source code. Version control enables seamless collaboration between developers and ensures full traceability of API changes over time. During development, the quality of the API specification can be checked using linting tools. These tools can check: Syntax and structure of the code Compliance with coding standards and naming conventions Correct use of libraries and frameworks Presence of dead or redundant code Potential security problems Swagger-Lint and Apicurio Studio or Stoplight can be used to carry out these and other linting checks, but these checks can be made into a CI/CD toolchain (more info to come in the API Lifecycle Management section). Automation plays a crucial role in this stage, enabling unit, security, and load tests to run seamlessly throughout the development process. Postman and Newman are often used to automate API testing to ensure quality and security requirements, but other solutions exist like REST Assured, Karate Labs, and K6. Development frameworks supporting API REST development are very common, and the most popular ones include Express.js with Node.js, Spring Boot, and Meteor. Most of the popular frameworks support HTTP, so it should not be a complicated action to choose. API capacities are a must when you choose a framework, but they are not the only ones. Developers will build your stack, so you'll need frameworks that are both appreciated by your devs and relevant to other technical challenges you'll have to tackle. And we have to speak about mock prototyping! It's something that could unlock developers' inter-dependency to propose a Mock API whenever you target internal or external developers. This is generally based on the OpenAPI description of your API and is often taken into account by API management portals. There are also dedicated OSS projects such as MockServer or WireMock. API Security API security is a major concern in API development and management. It is essential to implement authentication, authorization, and data encryption mechanisms to protect APIs against attacks and privacy violations. API keys, OAuth 2.0, and OpenID Connect are the three protocols to know: API keys are still widely used for API access due to their ease and low overhead. They are a unique set of characters sent as a pair, a user, and a secret, and should be stored securely like passwords. OAuth 2.0 is a token-based authentication method involving three actors: the user, the integrating application (typically your API gateway), and the target application. The user grants the application access to the service provider through an exchange of tokens via the OAuth endpoint. OAuth is preferred for its granular access control and time-based limits. OpenID Connect is a standardization of OAuth 2.0 that adds normalized third-party identification and user identity. It is recommended for fine-grained authorization controls and managing multiple identity providers, though not all API providers require it. In addition to that, solutions such as Keycloak can be deployed to provide centralized management of identity and API access. Alternatives of Keycloak include OAuth2 Proxy, Gluu Server, WSO2 Identity Server, and Apache Syncope. But just talking about tools and protocols would not be enough to cover API security. Contrary to what we sometimes read, a front-end web application firewall (WAF) implementing the OWASP rules will prevent many problems. And what certainly requires a dedicated DZone Refcard, like Getting Started With DevSecOps, a comprehensive DevSecOps approach will greatly reduce the risks. However, automated security testing is also essential to guarantee API robustness against attacks. OSS tools such as ZAP can be used to tackle automated security tests, identifying potential vulnerabilities in APIs and enabling them to be corrected before they can be exploited by attackers. API Lifecycle Management Once APIs have been developed, they need to be deployed and managed efficiently throughout their lifecycle. This involves version management, deployment management, performance monitoring, and ensuring API availability and reliability. API management platforms include, but are not limited to, Gravitee, Tyk, WSO2 API Manager, Google Cloud Apigee, and Amazon API Gateway are used for API deployment, version management, and monitoring. These platforms offer advanced features such as caching, rate limiting, API security, and quota management. Clearly, these are must-haves if you want to be able to scale. Figure 3. Running APIs To ensure compliance with standards and guidelines established during the design phase, tools such as Stoplight's Spectral are used to perform a linting analysis of OpenAPI specifications, identifying potential problems and ensuring API consistency with design standards. And of course, at the end of the chain, you need to document your API. Tools exist to automate many tasks, such as Redocly, which generates interactive documentation from the OpenAPI Specification. The added benefit is that you ensure that your documentation is constantly up to date and always simple and readable for everyone, developers and business analysts alike. API management also involves continuous monitoring of API performance, availability, and security, as well as the timely implementation of patches and updates to ensure their smooth operation. API Analysis and Monitoring Analysis and monitoring of APIs are essential to ensure API performance, reliability, and availability. It is important to monitor API performance in real time, collect data on API usage, and detect potential problems early. The ELK Stack (Elasticsearch, Logstash, Kibana) is often used to collect, store, and analyze API access logs for monitoring performance and detecting errors. OpenTelemetry is also used in many use cases and is a must-have if you want to monitor end-to-end processes, especially ones that include an API. Regarding API performance metrics, Prometheus and Grafana are commonly used in real time, giving much information on usage trends, bottlenecks, and performance problems. FinOps and Run Management Finally, once APIs are deployed and running, it's important to optimize operating costs and monitor cloud infrastructure expenses. FinOps aims to optimize infrastructure costs by adopting practices such as resource optimization, cost forecasting, and budget management. Cloud cost monitoring tools such as AWS Cost Explorer, Google Cloud Billing, and Azure Cost Management are used to track and manage cloud infrastructure spend, keeping operating costs under control and optimizing API spend. However, in a hybrid cloud world, we could consider open-source solutions like Cloud Custodian, OpenCost, and CloudCheckr. Conclusion Obviously, you don't need to put all this in place right away to start your API journey. You have to first think about how you want to work and what your priorities are. Maybe you should prioritize design tools, like linting tools, or define your API style book and API design tool. Of course, prioritize tools that are commonly used — there's no wheel to reinvent! In fact, I'd say implement everything that is at the beginning of this toolchain because it will be easier to change after. I hope all these points in mind will enable you to get started serenely while prioritizing your own API needs. This is an excerpt from DZone's 2024 Trend Report, Modern API Management: Connecting Data-Driven Architectures Alongside AI, Automation, and Microservices.Read the Free Report
When Your Code Comes Back to You...
May 2, 2024 by
System-Level Scrum Stakeholder Anti-Patterns
May 1, 2024 by CORE
Implementing EKS Multi-Tenancy Using Capsule (Part 2)
May 3, 2024 by
Explainable AI: Making the Black Box Transparent
May 16, 2023 by CORE
Implementing EKS Multi-Tenancy Using Capsule (Part 2)
May 3, 2024 by
DevSecOps With Open Source Tools
May 3, 2024 by
Low Code vs. Traditional Development: A Comprehensive Comparison
May 16, 2023 by
Implementing EKS Multi-Tenancy Using Capsule (Part 2)
May 3, 2024 by
Telemetry Pipelines Workshop: Filtering Events with Fluent Bit
May 3, 2024 by CORE
Low Code vs. Traditional Development: A Comprehensive Comparison
May 16, 2023 by
DevSecOps With Open Source Tools
May 3, 2024 by
Microsoft Reveals Phi-3: First in a New Wave of SLMs
May 3, 2024 by
Five IntelliJ Idea Plugins That Will Change the Way You Code
May 15, 2023 by