QEUR23_CHRLTM2: 文章の類似度を計測する
# ----
from sentence_transformers import CrossEncoder
# -----
# CrossEncoderを読み込む
ce = CrossEncoder('BAAI/bge-reranker-base')
# 中略
############################
# Step1回答のパーセンタイル類似度分析の結果
############################
# ------
# 文単位に分割する
def split_text_to_sentences(text):
return sent_tokenize(text)
# ---
# 分割したいパラグラフ
# CONCEPTパラグラフを文単位のオブジェクトに変換する
arr_concept_sentences = split_text_to_sentences(str_concept)
# ---
# 文単位に分割する
#arr_concept_sentences = [sent.text for sent in concept_doc.sents]
print("--- arr_concept_sentences ---")
print(arr_concept_sentences)
# ---
# 文の数(CONCEPT)
num_concept = len(arr_concept_sentences)
#print(num_concept)
# ------
# パーセンタイル達成度分析: Answer Step1とCONCEPTを比較する
# ------
# パーセンタイルマトリックスの生成
mx_percentile = np.zeros([num_df,5])
# ------
for k in range(num_df):
# ---
answer_paragraph = arr_AStep1[k]
#print(answer_paragraph)
# 文単位に分割する
arr_answer_sentences = split_text_to_sentences(answer_paragraph)
# ---
# 文の数(ANSWER)
num_answer = len(arr_answer_sentences)
print(num_answer)
# ---
# 類似度マトリックス
mx_similar_concept = np.zeros([num_concept, num_answer])
#print(mx_similar_concept)
# ---
# マトリックス計算(CONCEPT x ANSWER)
for i in range(num_concept):
str_concept_sentence = arr_concept_sentences[i]
# ---
for j in range(num_answer):
str_answer = arr_answer_sentences[j]
# -----
# 類似度の出力
similarity = ce.predict([str_concept_sentence, str_answer])
mx_similar_concept[i,j] = round(similarity,5)
print(f"i:{i}, j:{j}, string:{str_answer}, similarity:{similarity}")
# ---
arr_similar_concept = mx_similar_concept.flatten()
if len(arr_similar_concept) > 10:
val_p100 = np.percentile(arr_similar_concept, 100)
val_p98 = np.percentile(arr_similar_concept, 98)
val_p96 = np.percentile(arr_similar_concept, 96)
val_p94 = np.percentile(arr_similar_concept, 94)
val_p92 = np.percentile(arr_similar_concept, 92)
else:
val_p100 = np.max(arr_similar_concept)
val_p98 = 0
val_p96 = 0
val_p94 = 0
val_p92 = 0
# ---
# パーセンタイルマトリックスの生成
mx_percentile[k,:] = [val_p100, val_p98, val_p96, val_p94, val_p92]
print(k, val_p100, val_p98, val_p96, val_p94, val_p92)
# -----
#print("--- mx_percentile ---")
#print(mx_percentile)
# ----
# パーセンタイル達成度のデータフレームを生成する
arr_columns = ['PCT1','PCT2','PCT3','PCT4','PCT5']
df_percentile = pd.DataFrame(mx_percentile, columns=arr_columns)
df_percentile
QEU:FOUNDER : “以前は、「BAAI/bge-reranker-base」のモデルを全面的に使用していたんだが、今後は適材適所かな?”
D先生: “CohereのEmbedding は、何に使うんですか?”
QEU:FOUNDER : “FACT_BASED_QUESTIONに使用します。以前のモデルは、文単位では精度が出てきますが、段落ではあやしいです。しかし、CohereのEmbedding は、すごいよ!”
# -----
import numpy as np
import seaborn as sns
from sklearn.metrics.pairwise import cosine_similarity
# -----
import cohere
#co = cohere.Client("COHERE_API_KEY") # Your Cohere API key
co = cohere.Client() # Your Cohere API key
texts = ["I like to be in my house",
"I enjoy staying home",
"the isotope 238u decays to 206pb"]
response = co.embed(
texts=texts,
model='embed-english-v3.0',
input_type='search_document'
)
# -------
# テキストの分割
import re
# -----
# EXAMPLE OF ANSWER A
str_answer_A = """
The influence of Orthodox Christianity and the Russian Orthodox Church has profoundly shaped the social and political landscape of Russia. In 988, Prince Vladimir the Great of Kiev adopted Orthodox Christianity as the official religion of the state, marking the beginning of a long and complex rela-tionship between the Orthodox Church and the Russian state. The Russian Orthodox Church played a significant role in shaping Russian nationalism and Orthodox identity, which became a defining feature of Russian politics and society.
The Church also played a crucial role in the rise of the Romanov dynasty, which ruled Russia from 1613 to 1917. The Church supported the Romanovs, and in return, the dynasty protected and pro-moted the Church's interests. This close relationship between the Church and the state helped to con-solidate power and legitimize the Romanov rule.
In contrast, Ukraine's Orthodox Church has a more complex and contested history. The Ukrainian Orthodox Church was established in 1990, after Ukraine gained independence from the Soviet Union. The Ukrainian Orthodox Church has struggled to assert its independence from the Russian Ortho-dox Church, which has historically dominated Orthodox Christianity in Ukraine. This has contribut-ed to tensions between Ukraine and Russia, with the Ukrainian Orthodox Church seeking greater au-tonomy and recognition.
In recent years, the Russian Orthodox Church has continued to play a significant role in Russian pol-itics. The Church has become a key ally of the Putin regime, with the Church supporting the gov-ernment's conservative social policies and nationalist agenda. In return, the government has provid-ed financial and political support to the Church, which has helped to consolidate the Church's influ-ence in Russian society.
The Russian Orthodox Church has also played a crucial role in justifying imperial expansion and consolidating power under the Tsar, particularly during the 18th and 19th centuries. The Church's close relationship with the Russian state has contributed to its influence in Russian politics and socie-ty, while Ukraine's Orthodox Church has struggled to assert its independence from Russian domi-nance.
The Ukrainian Orthodox Church has sought independence (autocephaly) from the Moscow Patriar-chate, which has been a contentious issue and a point of tension with the Russian Orthodox Church. The Russian Orthodox Church has remained influential in contemporary Russian politics, with Putin utilizing the Church to consolidate power and promote a conservative social agenda.
"""
# 文字列を段落に分割する(for answer A)
paragraphs = re.split(r'\n\s*\n', str_answer_A.strip())
array_answer_A = []
for paragraph in paragraphs:
array_answer_A.append(paragraph.strip())
#print(paragraph.strip())
# ---
print("----")
print(array_answer_A)
# -----
# SUMMARY文(A)
str_summary_A = "Orthodox Christianity has profoundly influenced Russian society and politics since its adoption in 988. The Russian Orthodox Church has been closely tied to the state, supporting rulers and shaping national identity. In contrast, Ukraine's Orthodox Church has sought independ-ence from Russian dominance, contributing to tensions between the two countries."
print("----")
print(str_summary_A)
# -----
# EXAMPLE OF ANSWER B
str_answer_B = """
Putin's authoritarianism and nationalism have driven his foreign policy decisions, including the in-vasion of Ukraine. According to a research paper by the Brookings Institution, "Putin's authoritari-anism has been a key driver of his foreign policy decisions, including the annexation of Crimea and the ongoing conflict in Eastern Ukraine." This is supported by a quote from Putin himself, who stat-ed in a 2014 speech, "We will do everything to support the people of Crimea, who have turned to us in their hour of need."
The invasion of Ukraine has had significant implications for global security architecture. A report by the International Institute for Strategic Studies notes that the invasion of Ukraine has "heightened tensions between Russia and the West, leading to a breakdown of diplomatic relations and wide-spread economic sanctions on Russia." This is supported by a statement from NATO Secretary-General Jens Stoltenberg, who stated, "Russia's actions in Ukraine have fundamentally changed the security landscape in Europe."
The conflict has accelerated a geopolitical realignment, with a growing divide between democracies and autocracies. An article in Foreign Affairs notes that the conflict in Ukraine has "accelerated a geopolitical realignment, with a growing divide between democracies and autocracies." This is sup-ported by a quote from German Chancellor Olaf Scholz, who stated, "The war in Ukraine has shown us that the world is no longer bipolar, but multipolar."
The economic sanctions imposed on Russia have had a significant impact, but Russia's economy re-mains resilient. A report by the International Monetary Fund notes that the economic sanctions im-posed on Russia have had a significant impact, with Russia's economy contracting by 2.8% in 2022. However, Russia's economy remains resilient due to its energy exports, particularly to Europe, and the rise in oil and gas prices has offset some of the sanctions' effects.
"""
# 文字列を段落に分割する(for answer B)
paragraphs = re.split(r'\n\s*\n', str_answer_B.strip())
array_answer_B = []
for paragraph in paragraphs:
array_answer_B.append(paragraph.strip())
#print(paragraph.strip())
# ---
print("----")
print(array_answer_B)
# -----
# SUMMARY文(B)
str_summary_B = "Putin's authoritarian nationalism has driven his foreign policy, including Ukraine's invasion. This has significantly impacted global security, worsening Russia-West relations and leading to sanctions. The conflict has accelerated geopolitical realignment between democracies and autocracies. Despite sanctions, Russia's economy remains resilient due to energy exports."
print("----")
print(str_summary_B)
QEU:FOUNDER : “ここまでは、データを準備するプログラムです。文章A(array_A)と文章B(array_B)があります。この2つの文章を段落ごとに分割してリストAとリストBを生成します。一方、プロンプト・エンジニアリングでこの2つの文章のまとめ(summary_A, summary_B)を作りました。”
(類似度、比較方案)
- Summary_Aとarray_Aを比較する
- Summary_Bとarray_Bを比較する
- Summary_Aとarray_Bを比較する
- Summary_Bとarray_Aを比較する
D先生: “`そうすると、類似度評価には4種類考えられますよね。。”
QEU:FOUNDER : “じゃあ、エンベッディングによる類似度分析をやってみましょう。”
# -----
import numpy as np
import seaborn as sns
from sklearn.metrics.pairwise import cosine_similarity
# -----
import cohere
#co = cohere.Client("COHERE_API_KEY") # Your Cohere API key
co = cohere.Client() # Your Cohere API key
# -----
#array_answer_A
num_answer_A = len(array_answer_A)
response_answer_A = co.embed(
texts=array_answer_A,
model='embed-english-v3.0',
input_type='search_document'
)
embeddings_answer_A = response_answer_A.embeddings
print(num_answer_A)
print(embeddings_answer_A[0])
# -----
#array_answer_B
num_answer_B = len(array_answer_B)
response_answer_B = co.embed(
texts=array_answer_B,
model='embed-english-v3.0',
input_type='search_document'
)
embeddings_answer_B = response_answer_B.embeddings
print(num_answer_B)
print(embeddings_answer_B[0][0:20])
# -----
#print(str_summary_A)
response_summary_A = co.embed(
texts=[str_summary_A],
model='embed-english-v3.0',
input_type='search_document'
)
embeddings_summary_A = response_summary_A.embeddings
print(embeddings_summary_A[0][0:20])
# -----
#print(str_summary_B)
response_summary_B = co.embed(
texts=[str_summary_B],
model='embed-english-v3.0',
input_type='search_document'
)
embeddings_summary_B = response_summary_B.embeddings
print(embeddings_summary_B[0][0:20])
# -----
# Cosine Similarity(A-A)
for i in range(len(array_answer_A)):
print(f"Cosine similarity between sentences A and A{i}:", co-sine_similarity([embeddings_summary_A[0]], [embeddings_answer_A[i]])[0][0])
QEU:FOUNDER : “結果はこうなります。ドン!!”
- Cosine similarity between sentences A and A0: 0.877182150568382
- Cosine similarity between sentences A and A1: 0.6410007438521332
- Cosine similarity between sentences A and A2: 0.7495718720352285
- Cosine similarity between sentences A and A3: 0.7497282746697849
- Cosine similarity between sentences A and A4: 0.8250077291455422
- Cosine similarity between sentences A and A5: 0.773840196766115
D先生: “あれだけ長い段落でも、その意味を読み取って評価しれくれるのでありがたいモノです。それでも、他の比較条件を確認しなければ・・・。”
QEU:FOUNDER : “じゃあ、他の比較条件を見てみましょう。”
D先生: “確認しました。CohereのEmbeddingは、ちゃんと意味を読み取って比較をしています。これは、とても便利だわ・・・。”
QEU:FOUNDER : “FACT_BASED_OPINIONは、意外と内容が複雑になるので、これを使わないと比較はできなくなりますね。”
D先生: “同意します。いよいよ、テストをやりましょう。”
~ まとめ ~
・・・ 前回のつづきです ・・・
QEU:FOUNDER : “もちろん、生命体ではない時代感覚のないAIごときにそこまでの深堀はできないとは思います。ただし、今回はシステムに入力するGOALとして2つ(AとB)を用意します。これで、前回よりは深堀がしやすくなるでしょう。”
C部長 : “ディベートするグループAとBに文書化すると、2グループの意見の違いが分かりやすくなりますからね。”
QEU:FOUNDER : “おもむろに、このおっさん(↑)がでてきたら、笑うわ。”
C部長 : “GOALを、どんな風に設定するんですか?”
日本の長期の経済停滞の理由は、若者の怠慢である。おそらくバブル化以降の文化的な堕落の結果ではないだろうか。若者は、いかなる手段を以てしても就職して、家を買い、結婚をして子供を作るものである。仕事を選ばなければいくらでもある。もし給料が半分であったら、他人の2倍働けば良いのではないだろうか。規制緩和で競争が増えるのは良いことであり、ものが安くなり、結果としてイノベーションが生まれる。生産性を高めるには、人件費を安くさえすればよい。人件費が高いから生産性が低く、その結果として経済がわるいのだ。現状の自民党と大企業の癒着は、単に「よい協力関係」であるにすぎない。彼らが、世界の大企業と戦っているので、日本の競争力を保つことができる。逆に言うと、振興企業にはリスクが高く、あまり期待すべきものではない。社会のダイバーシティは現在のレベルで十分である。優秀な海外学生を招聘し、移民を推進すべきでない。自国の若者の機会を減らすだけである。
QEU:FOUNDER : “GROUP_BのGOALの定義文は、こんな感じ(↑)でどう・・・?”
C部長 : “渋い・・・。・・・とすると、その反対がGROUP_Aですよね。”
QEU:FOUNDER : “まあ、ここで設定したGOALなんぞは、「暫定的」なものです。どうせ、この先、コロコロかわるものですから。”
QEU:FOUNDER : “立派なJ国人は、そうあるべきなんです。蓮舫流行ってる!”