2023-09-26

Published on Tuesday, September 26, 2023

By PermaDAO

In AI Law Blockchain

Tags AI, Law, Blockchain

AI + Law + Blockchain: Making Legal Services Accessible to Everyone

Web3 is booming, and Arweave is becoming a popular infrastructure choice for developers. PermaDAO is a community where everyone can contribute to the Arweave ecosystem. It's a place to propose and tackle tasks related to Arweave, with the support and feedback of the entire community. Join PermaDAO and help shape Web3!

Author: Spike @ Contributor of PermaDAO

Translator: Kyle @ Contributor of PermaDAO

Reviewer: John Khor @ Contributor of PermaDAO

AI + Law + Blockchain: Making Legal Services Accessible to Everyone

Unlike previous technological innovations, the development of AI is rapidly changing the daily lives of humans, from industrial robots to various ChatBots, both in the virtual and physical realms.

The most significant feature of this current era of AI are Large Language Models (LLMs) and Generative Adversarial Networks (GANs). LLMs, as the name suggests, excel in their sheer number of parameters and the vast amount of training data they can handle. These characteristics enable LLMs to tackle complex problems effectively. If we cannot cultivate experts in every industry, we can at least make them memorize all the data from all industries.

GANs involve continuous fine-tuning of model performance through a reward-and-penalty mechanism to enhance accuracy. While this approach may not solve all problems, it is effective for addressing most of them.

Of course, the development of AI is not static. There are currently two main approaches:

Continuing to Invest in LLMs: The goal is to train a knowledge base that encompasses all human knowledge, with the aim of nurturing silicon-based "intelligence" that can replace human thinking.
Cultivating Specialized Small Models: Focusing on optimizing methods in various domains to reduce the need for extensive parameters and training resources while achieving stronger economic benefits in specific fields.

From current practices, both approaches have their strengths, but LLMs, in particular, dominate. The reason for this is that LLMs primarily deal with textual data as input and output, which is easier to train and apply compared to complex data formats like images, audio, and videos.

1.1 Understanding AI Large Models (LLMs)

AI Large Models (LLMs) refer to artificial intelligence models with a large number of parameters and complex structures. They typically consist of billions of parameters, allowing them to handle large amounts of data and complex tasks. These models are trained using deep learning algorithms and can automatically learn from data to make predictions, classify information, generate content, and perform various tasks.

Here are key aspects to understand about LLMs:

Parameter Scale: LLMs typically have one billion of parameters, enabling them to model complex data more accurately. These parameters are learned from training data and can be adjusted and optimized for specific tasks.
Complex Structure: LLMs have complex structures with multiple layers and modules, allowing them to handle different types of input data and perform feature extraction and representation learning at different levels. These structures are trained using deep learning algorithms to improve model performance and generalization.
Application Areas: LLMs find applications in various domains, including natural language processing, computer vision, speech recognition, and more. They can be used for tasks like machine translation, image classification, speech generation, providing intelligent and efficient solutions.

For example, some of the popular large models include GPT-3 with 175 billion parameters, GPT-4 with trillions of parameters, Baidu's Ernie Bot with 260 billion parameters, and Alibaba's "Tongyi Qianwen" claiming to have 100 trillion parameters.

While the number of parameters does not directly equate to computational capability, there is a generally accepted positive correlation between the two. The arms race for larger model parameters is expected to continue for some time.

Generally, large AI models often face several issues, which can even hinder the further expansion of LLM (Large Language Models) applications. These issues include diminishing marginal utility, where the effects do not increase in sync after over-increasing parameters. GPT-4, with its trillion parameters, is currently at the limit.

Overcomplexity: AI large models typically consist of millions or even billions of parameters with highly intricate structures and functionalities. While these models can handle vast amounts of input data, quantifying how specific parameters should be modified for better optimization remains challenging and often relies on the practice of "tweaking parameters," colloquially known as alchemy.
Enormous Data Requirements: Although training can be carried out with extensive datasets to continuously enhance LLM performance and accuracy, human textual data is currently insufficient to meet the demands of LLM. Broader non-numeric data also needs consideration, but training with such data presents difficulties.
Long Road to Generalization: While LLMs can be applied across various domains, including natural language processing, image recognition, speech recognition, and more, achieving a universally applicable large model remains elusive. Even GPT-4 cannot excel in all domains.
Resource Intensiveness: Due to the complexity and scale of AI large models, they typically demand substantial computational resources for training and inference. This includes high-performance computing devices, extensive storage space, and high-speed network connectivity.
Privacy and Security Challenges: AI large models often deal with a plethora of sensitive data, such as personal information and business secrets. Consequently, safeguarding data privacy and ensuring model security pose significant challenges.
Interpretability Issues: The decision-making processes of AI large models are often exceedingly complex and difficult for humans to comprehend and explain. This presents challenges to the credibility and reliability of the models, necessitating further research and exploration.

Given these challenges, the era of AI LLMs is not the final destination but a new beginning. Leveraging the advantages of LLMs while addressing or mitigating the mentioned issues is crucial.

AI History: Leading to Large Models

The development of AI has gone through multiple transformations and progressions. For instance, the Transformer architecture is a prerequisite for all GPT applications. Therefore, let's briefly review the history of AI's development:

2.1 Classification Based on Model Structure

Convolutional Neural Networks (CNNs): CNNs are a commonly used neural network structure, primarily for processing image and video data. They extract features from images through multiple layers of convolution and pooling operations and classify them through fully connected layers.
Recurrent Neural Networks (RNNs): RNNs are designed to handle sequential data. They maintain memory of past information through recurrent connections, making them suitable for tasks like natural language processing and speech recognition.
Generative Adversarial Networks (GANs): GANs consist of a generator and a discriminator network. They generate realistic data through adversarial training. The generator generates data, and the discriminator judges its authenticity. GANs are used for generating high-quality data.
Transformer: Transformers are designed for sequential data processing and excel in natural language processing tasks. They use self-attention mechanisms to capture dependencies in sequences and feed-forward neural networks for feature extraction and classification.
Graph Neural Networks (GNNs): GNNs are specialized for graph data. They aggregate and update features of nodes and edges in a graph, making them suitable for tasks such as social network analysis and recommendation systems.

2.2 Classification Based on Training Methods

Supervised Learning: Supervised learning is a method of training models using given input and corresponding output data. In supervised learning, the model predicts based on the relationship between inputs and outputs. Common supervised learning algorithms include linear regression, logistic regression, decision trees, and neural networks, among others. For example, in the financial industry, supervised learning can be used to predict stock prices or determine if a customer is likely to default.
Unsupervised Learning: Unsupervised learning is a method of training models without labeled data. In unsupervised learning, the model learns by discovering patterns and structures within the data. Common unsupervised learning algorithms include clustering, dimensionality reduction, and association rules, among others. For example, in the financial industry, unsupervised learning can be used for customer segmentation analysis or detecting anomalous transactions.
Semi-Supervised Learning: Semi-supervised learning is a training method that falls between supervised and unsupervised learning. In semi-supervised learning, models are trained using both labeled and unlabeled data. Labeled data guides the model's learning, while unlabeled data helps discover data structures and patterns. Applications of semi-supervised learning in the financial industry include credit scoring and fraud detection.
Reinforcement Learning: Reinforcement learning is a method of training models through trial and error. In reinforcement learning, models interact with the environment and adjust their behavior based on feedback signals. Applications of reinforcement learning in the financial industry include stock trading and risk management.

If we briefly divide the generations and genres of AI development, we can roughly categorize them into the following four eras, with LLM being the latest development in deep learning, which is a subset of it.

Symbolic Era: From the 1950s to the 1970s, symbolism dominated machine learning. During this period, machine learning algorithms relied mainly on human expert knowledge and rules, translating this knowledge into a form that computers could understand through programming. However, this approach had limitations in handling complex and ambiguous problems.
Connectionist Era: From the 1980s to the 1990s, connectionism emerged as a new approach to machine learning. Connectionism simulates neural networks and transforms machine learning problems into weight adjustment problems. This approach can handle more complex problems and can learn patterns and regularities from data.
Statistical Learning Era: From the 1990s to the 2000s, statistical learning became the mainstream of machine learning. Statistical learning builds probability models and uses statistical methods for model parameter estimation and prediction. This approach has significant advantages in handling large-scale data and complex problems, such as support vector machines and random forests.
Deep Learning Era: Since the 21st century, deep learning has become a hot field in machine learning. Deep learning constructs deep neural networks and trains them using large amounts of data. Deep learning has made significant breakthroughs in areas like image recognition, natural language processing, and speech recognition, making machine learning more widely applicable in practical applications.

Large Models 2.0: Fine-Tuning vs. Slimming

As an example of a specific field, the integration of it with AI has long been established. In the 1970s, expert systems began to rise and achieved significant accomplishments in certain fields. One famous example is the MYCIN system, which is an expert system used for diagnosing bacterial infections. The MYCIN system, by converting the knowledge of medical experts into rules and reasoning mechanisms, can accurately diagnose bacterial infections and provide corresponding treatment recommendations. This achievement attracted widespread attention and research, promoting the application of expert systems in fields such as medicine, engineering, and finance. However, due to the limitations of knowledge representation and reasoning mechanisms in expert systems and their reliance on domain expert knowledge, expert systems also revealed some problems in practical applications. Nevertheless, the rise of expert systems laid the foundation for weak artificial intelligence and provided valuable experience and insights for subsequent AI technologies.

In the context of the legal field, its direct relevance is natural language processing (NLP), which refers to the process of encoding, decoding, and transforming human language into machine instructions, followed by corresponding analysis. Legal texts differ significantly from general texts; they are precise in writing, use precise terminology, and possess a high degree of patternization and fixed style. Therefore, they are relatively easy for machines to understand and analyze.

NLP applications are extensive and encompass many fields. Here are some common examples of natural language processing applications:

Information Extraction: Extracting useful information from large volumes of text, such as extracting names, places, and dates from news articles.
Text Classification: Categorizing text into predefined categories, such as classifying emails as spam or non-spam.
Machine Translation: Automatically translating text from one language to another, such as translating English to Chinese.
Sentiment Analysis: Analyzing the emotional tone of text to determine whether it is positive, negative, or neutral.
Question Answering: Finding the most relevant answers from a large volume of text based on user questions.
Text Generation: Creating new text based on given context, such as generating news reports from a piece of text.

According to OpenAI's research, most legal professionals' work can be replaced by AI, whether it's standardized judicial exams, case retrieval, or document writing; AI has shown its comprehensive dominance over humans.

However, this does not mean that legal professionals will resist the integration of AI. According to a survey report by LexisNexis, 81% of respondents (lawyers) are already aware of the existence of generative AI, and over 40% of lawyers have used similar products.

In addition, the combination of AI and law has been a persistent track for legal startups. Starting with DoNotPay in 2015, the use of AI to write legal documents has already become a mature business model. However, overall, most AI legal products still operate in a black-box state, hoping to counteract the market strategies of other competitors with more data collection work, rather than taking a more feasible approach to fully consider protecting user privacy, as well as developing technical solutions to meet compliance and rights protection.

In this regard, Arweave's privacy protection and data permanence features have special significance. On the one hand, data permanence does not necessarily conflict with privacy protection. It is entirely possible to achieve both privacy protection and commercial operation. On the other hand, the user's business value does not depend on the amount of money involved but on the number of service calls, ensuring that anyone can access professional legal services and avoiding the traditional legal system's exclusion of vulnerable groups.

LEGALNOW: Making Professional Legal Services Affordable for Everyone

As mentioned earlier, the combination of AI and law is overshadowed by expert systems. However, with the emergence of GPT, everyone can access professional legal services. Still, such services have limitations:

General models cannot focus. General large models are like jack-of-all-trades, but for very specific details, users often need to provide more contextual prompts. However, users usually do not possess this specialized knowledge, making it difficult to fully leverage the power of large models.
Data security concerns. Legal information typically involves personal privacy, especially professional contracts, documents, and memoranda, which may directly reveal personal, company secrets, or information. Centralized servers cannot technically guarantee the security of information.
Customized services for specific needs. In some business scenarios, such as legal contracts involving multiple parties, it often requires the personal witness of professional lawyers to ensure the legality of legal texts and the absence of defects. The technology companies behind large models often cannot provide such services.

In view of this, LegalNOW has proposed its own solution:

The backend is based on the most powerful GPT-4 32k model and is trained and fine-tuned by top-tier senior lawyers. To provide the most professional feedback to large models, traditional large model training processes are usually completed by residents of third-world countries with relatively low wages, which may lack professionalism for specific needs. LegalNOW, incubated by LegalDAO, has thousands of professional lawyers who will be responsible for the feedback of legal terminology, striving to achieve the level of "ready-to-use" right out of the box.
24/7 lawyer-level legal support. Beyond simple large model conversations, LegalNOW is also building a support and service system, recognizing the specificity and professionalism of the law. LegalNOW will not hand everything over to machines but hopes to use the dual drive of human and machine to fill the global gaps in legal services.
User data is encrypted and stored using blockchain for permanent storage. Mind Network will be responsible for the privacy on-chain and circulation of user data, dispelling concerns about users' personal data being abused. Secondly, Arweave will permanently store all data, avoiding disputes.

In terms of business direction, LegalNOW currently focuses on contract drafting in the blockchain field. In the past, drafting such contracts was time-consuming, labor-intensive, and costly. However, under the LegalNOW service model, users can pay according to the number of calls.

Summary

The development of AI large models has spanned several years, but it still primarily focuses on creative fields such as writing and art rather than more standardized scenarios. This can be considered a "small shock" that AI has given to humanity. At the same time, people are eagerly hoping that AI can play a more significant role in productivity scenarios, such as in the fields of healthcare, law, or science.

LegalNOW's current attempt can be seen as a beginning. It is dedicated to creating genuinely universal legal service products. When combined with blockchain's privacy components, AI can become more secure and trustworthy. We certainly do not want to escape the privacy abuses of Web2 giants only to freely give our data to AI companies. Blockchain is almost the only solution to this problem.

In any case, the era of AI has arrived, and how we coexist with AI will be a question that everyone in the next era must consider. So why not start now by drafting a legal document.