The role of AI/ML in design and testing is expanding

The effectiveness of AI and ML and how it will impact the time from design to testing remains unclear.

The role of artificial intelligence and machine learning in testing is increasingly growing, saving a significant amount of time and money beyond initial expectations. However, it is not effective in all situations, and sometimes it can even disrupt well-tested processes, with the return on investment being questionable.

One of the major appeals of artificial intelligence is its ability to analyze large datasets, which are often limited by human capabilities. In the critical design-to-test domain, AI can address issues such as tool incompatibilities between design setup, simulation, and ATE test programs, which typically slow down debugging and development efforts. Some of the most time-consuming and expensive aspects of design to test stem from incompatibilities between tools.

Teradyne's Chief Software Engineer, Richard Fanning, said, "During device startup and debugging, complex software/hardware interactions can expose the need for domain knowledge across multiple teams or stakeholders who may not be familiar with each other's tools. Any time spent transitioning in these settings or debugging differences is a drain on energy. Our tool set addresses this issue, allowing all setups to use the same set of source files, ensuring that everyone is running the same thing."

Advertisement

Machine learning and AI analysis can also alleviate some of the monotonous tasks in engineering. There is often concern that AI will replace workers, but this concern is exaggerated. Most workers simply shift to higher-level responsibilities, incorporating AI as a new tool in their arsenal. However, this tool will be used where it can have the greatest impact, which may include multiple stages of the design-to-manufacturing process. Nevertheless, how and where the data from one part of the process interacts with the data from other parts of the process can vary greatly, which is why the industry is proceeding with caution.

Shankar Krishnamoorthy, General Manager of Synopsys EDA Group, said, "Generative AI opens up many new opportunities. But you need to be clear about what you want to do. Unless you can guide the LLM with the right prompts, you end up with gibberish. Unless you are a good engineer who knows how to interpret the output of LLM, you are likely to accept flawed or suboptimal solutions, and then the chips you deliver will be poor. So, AI will not become a super assistant for every engineer, helping them accomplish three to five times the work they did a few years ago. But this technology is evolving rapidly."

This will not diminish the role of engineering teams, who remain crucial for accelerating the process from design to testing, guiding and validating ML models, and verifying that the system operates as expected. Ron Press, Senior Director of Technical Support at Siemens Digital Industries Software, said in a recent MEPTEC event presentation, "AI has some fantastic capabilities, but it is essentially just a tool. We still need engineering innovation. Sometimes people write about how AI will take away everyone's jobs, and I don't believe that at all. Our designs are more complex, and the scale of design is larger. We need to leverage AI as a tool to accomplish the same work at a faster pace."Despite this, it does provide engineers with a potentially powerful new tool for identifying potential issues and managing runaway complexity.

Ken Butler, Senior Director of Business Development for the Advantest ACS Data Analysis Platform Group, said: "As we continue to advance along this technology curve, the analysis and computational infrastructure we must adopt becomes increasingly complex, and you want to be able to make the right decisions with minimal overinvestment. In some cases, we customize the test solution based on the type of chip."

Accelerating the speed from design to feature analysis to the first silicon wafer

Facing the ever-shrinking process window and the lowest allowable defect rate, chip manufacturers are continuously improving the process from design to testing to ensure the highest efficiency during device startup and mass production. Butler of Advantest said: "Analysis in test operations is nothing new. The industry has a history of more than 30 years of analyzing test data and making product decisions. The difference now is that we are moving towards increasingly smaller geometries, advanced packaging technologies, and chip-based designs. This prompts us to change the nature of the analysis we do, including in terms of software and hardware infrastructure. But from the perspective of production testing, our journey in AI and testing is still in its early stages."

Nevertheless, early adopters are building the infrastructure required for online computing and AI/ML modeling to support real-time inference in test units. And since no single company has all the expertise needed, compatibility between tools is considered when developing partnerships and application libraries.

Fanning of Teradyne said: "The protocol library provides ready-made solutions for communicating common protocols. This reduces the development and debugging workload for device communication. We have seen cases where test engineers are responsible for interfacing with new protocols, and using this feature can save a lot of time."

In fact, data compatibility is a constant theme, from design to the latest developments in ATE hardware and software. "As device complexity grows exponentially, using the same test sequence between feature analysis and production has become crucial," Fanning of Teradyne explained. "Collaboration with EDA tools and IP suppliers is also key. We have extensively collaborated with industry leaders to ensure that the libraries and test files they output are in a format that our systems can use directly. These tools also have device knowledge that our tool set does not have. That's why the remote connection feature is so important, as our partners can provide context-specific tools that are powerful during production debugging. Being able to use these tools in real-time without having to reproduce the setup or use case in different environments changes the game."

Serial Scan Testing

However, if all configuration changes seem to be happening on the test end, then it is necessary to assess significant changes in multi-core design test methods.

For multi-core products, trade-offs in the test design (DFT) iteration process become very important, necessitating a new approach."If we look at the typical combination of designs today, we will find that multiple cores will be produced at different times," said Siemens' Press. "You need to know how many I/O pins are needed to get the scan channel, and the tester's deep serial memory will transfer data to the core through I/O pins. Therefore, I need to weigh many variables. I have the number of pins leading to the core, the size of the pattern, and the complexity of the core. Then, I will try to find the best combination of cores to be tested together in the so-called hierarchical DFT. But as these designs become more and more complex, the number of cores will exceed 2,500, and many factors need to be weighed."

Press pointed out that AI applying the same architecture can provide a higher efficiency of 20% to 30%, but the actual improvement based on the group scan test is more meaningful.

"The test channel no longer feeds data to the scan channel of each core, but feeds it to all cores through a group bus and packet. Then, you can instruct the core when it can use its packet information. By doing this, you don't need to weigh so many variables," he said. At the core level, each core can be optimized for any number of scan channels and patterns, and the number of I/O pins is no longer a variable in the calculation. "Then, when you put it into the final chip, it will provide the data required by the core from the packet, which can be used with any size of serial bus, which is called the Serial Scan Network (SSN)."

Some results reported by Siemens EDA customers highlight the implementation of supervised and unsupervised machine learning to improve diagnostic resolution and fault analysis. Using the Serial Scan Network method, DFT productivity has increased by 5 to 10 times.

What slows down the implementation of AI in HVM?

The application of machine learning algorithms in the transition from device design to testing can bring many benefits, from better matching chip performance for use in advanced packaging to shortening test time. For example, only a small part of high-performance devices may need to undergo aging tests.

NI/Emerson Test and Measurement Researcher Michael Schuldenfrei said: "You can identify scratches on the wafer, and then automatically screen out chips around the scratches during the wafer classification process. So, AI and ML sound like great ideas, and it makes sense to use AI in many applications. The biggest question is, why doesn't it happen frequently and on a large scale? The answer lies in the complexity of building and deploying these solutions."

Schuldenfrei summarized the four key steps in the machine learning lifecycle, each with its own challenges. In the first phase, the training phase, the engineering team uses data to understand a specific problem and then builds a model that can be used to predict results related to that problem. Once the model is verified and the team wants to deploy it to a production environment, it needs to be integrated with existing equipment (such as testers or Manufacturing Execution Systems (MES)). The model will also mature and develop over time, requiring frequent verification of the data input to the model and checking whether the model is running as expected. The model must also adapt, requiring redeployment, learning, action, verification, and adaptation, forming a continuous cycle."This consumes a lot of time for data scientists, who are responsible for deploying all these AI-based new solutions within their organizations. Time is also wasted when they try to access the right data, organize the data, connect all the data together, understand the data, and extract meaningful features from it," said Schuldenfrei.

In a distributed semiconductor manufacturing environment, there are many different test organizations distributed around the world, which brings more difficulties. "When you finish the implementation of the ML solution, your model is already outdated, and your product may no longer be cutting-edge technology. Therefore, when the model needs to make a decision, it has lost operability, and this decision will actually affect the specific device's packaging or processing," said Schuldenfrei. "Therefore, deploying ML-based solutions in a production environment with a large number of semiconductor tests is by no means an easy task."

He cited an article from Google in 2014, which pointed out that the machine learning code development part is the smallest and simplest part of the whole process, while building infrastructure, data collection, feature extraction, data verification, and management model deployment aspects are the most challenging parts.

Changes from design to testing will affect the entire ecosystem. Siemens said, "People engaged in EDA have put a lot of effort into design rule checking (DRC), which means we are checking whether the work we have done and the design structure can be safely carried out. This is very important for artificial intelligence - we call it verifiability. If we run some kind of artificial intelligence and give us a result, we must ensure that this result is safe. This will indeed affect people engaged in design, DFT groups, and test engineers, who must adopt these patterns and apply them."

There are many ML-based applications available to improve test operations. Advantest's Butler highlighted some of the applications that customers most often pursue, including shortening search time, shift left testing, shortening test time, and chip pairing.

"For minimum voltage, maximum frequency, or fine-tuning testing, you tend to set lower and upper limits for the search, and then you will search there to find the minimum voltage for this specific device," he said. "These limits are set according to the process division, and they may be quite broad. However, if you have analytical techniques that can be applied, then AI or ML-type techniques can basically tell you where the chip is in the process spectrum. Maybe it is fed forward from an earlier insertion, maybe you can combine it with the operation at the current insertion time. This inference can help you narrow the search range and speed up testing. Many people are very interested in this application, and some people are using it in production to reduce the search time for time-consuming tests."

"The idea behind shift left (left shift) may be that my downstream test insertion cost is very high, or the packaging cost is high," Butler said. "If my yield does not reach the level I want, then I can use analysis in the early insertion, try to predict which devices may fail in the later insertion by analyzing in the early insertion, and then downgrade or scrap these chips to optimize downstream test insertion, improve yield, and reduce total cost. Reducing test time is very simple, just add or delete test content, skip testing to reduce costs. Or you may want to add test content to improve yield," Butler said.

"If I have a multi-layer device that will not pass the bin 1 standard - but if I add some additional content, it may pass the bin 2 - then people may look at the analysis to try to make these decisions. Finally, in my view, there are two things that are combined, that is, the idea of chip design and intelligent pairing. So the classic example is stacking high-bandwidth memory on the processor chip. Maybe I am interested in high performance and low power consumption for some applications, and I want to be able to match the content and classify it during the chip test operation, and then pick and place it downstream to maximize the yield of multiple data streams. For example, there are similar things for low power footprint and carbon footprint." Generative Artificial IntelligenceWhen discussing the role of artificial intelligence in the semiconductor field, an inevitable question arises: can large language models like ChatGPT be useful for engineers working in wafer fabs? Early research has shown some promise.

"For example, you can ask the system to build an outlier detection model for you, which will look for parts that are 5 sigmas away from the centerline, and say 'please create a script for me,' and the system will create the script. These are the automated, generative AI-based solutions we are already trying," said Schuldenfrei. "But from everything I've seen so far, there is still quite a bit of work to be done to get these systems to provide output of sufficiently high quality. At present, the amount of human interaction required to fix issues in the algorithms or models generated by generative AI is still quite substantial."

A lingering question is how to access the test programs needed to train new test programs when everyone is protecting important test IP. "Most people value their test IP and may not necessarily want to set up safeguards during the training and use process," Butler said. "So finding a way to accelerate the overall process of developing test programs while protecting IP is a challenge. It's clear that this technology will be applied, as we have already seen in the software development process."

Fault Analysis

Fault analysis is usually a costly and time-consuming task for wafer fabs, as it requires looking back in time to collect data on wafer processing, assembly, and packaging for specific failed devices, known as Return Material Authorization (RMA). Physical fault analysis is conducted in FA labs using various tools to track down the root cause of the failure.

While scanning diagnostic data has been used for decades, a newer approach is to pair digital twins with scanning diagnostic data to identify the root cause of failures.

"In testing, we have a digital twin that can perform root cause deconvolution based on scanning fault diagnostics. So we don't have to look at the physical device and spend time trying to figure out the root cause, because we have the scan, we have millions of virtual sampling points," said Siemens Publishing. "We can reverse engineer the work done to create patterns and find out where the error occurred deep in the design of the scanning unit. Using YieldInsight and unsupervised machine learning, along with training on a large amount of data, we can quickly identify the location of the failure. This allows us to run thousands or tens of thousands of fault diagnostics in a short period of time, giving us the opportunity to identify systemic yield limiting factors."

Another increasingly popular method is to use on-chip monitors to access specific performance information instead of physical fault analysis. Alex Burlak, Vice President of Test and Analysis at proteanTecs, said: "What we need is deep data from inside the package to continuously monitor performance and reliability, and that's what we provide. For example, if the failure is suspected to come from chip interconnects, we can use deep data from on-chip agents to help with the analysis, instead of removing the device from the environment and bringing it into the lab (where you may not be able to reproduce the problem). More importantly, in many cases, the ability to send back data instead of the device can identify the problem, saving costly RMA and fault analysis procedures."

Conclusion

The enthusiasm of the ATE community for AI and machine learning is met with a strong infrastructure change to meet the needs of real-time inference of test data, as well as the needs for higher yield, higher throughput, and chip classification optimization for multi-chip packages. For multi-core designs, the commercialization of the grouped test method for serial scan network SSN method provides a more flexible approach to optimize each core to meet the needs of the number of scan chains, patterns, and bus width required for each core in the device.The number of test applications that can benefit from AI is continuously increasing, including shortening test time, reducing Vmin/Fmax search, shift left, intelligent chip pairing, and reducing overall power consumption. New developments such as using the same source files for all settings in design, features, and testing help to accelerate the critical debugging and development stages of new products.

*Statement: This article is the original creation of the author. The content of the article is the author's personal opinion, and our reposting is only for sharing and discussion, does not represent our approval or agreement, if there are any objections, please contact the backend.

Comments