HanLP 是由一系列模型和算法组成的Java工具包。目标是普及自然语言处理在生产环境中的应用。它不仅是分词,还提供了词法分析、句法分析、语义理解等完整的功能。HanLP 具有功能齐全、性能高效、结构清晰、语料最新、功能可定制等特点。
HanLP 是完全开源的,包括字典。不依赖其他jar,底层使用了一系列高速数据结构,如双数组Trie树、DAWG、AhoCorasickDoubleArrayTrie等,这些基础组件都是开源的。
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
package com.iqilu;
import com.hankcs.hanlp.HanLP;
import com.hankcs.hanlp.seg.common.Term;
import java.util.List;
public class DemoSegment {
public static void main(String[] args) {
String[] testCase = new String[]{
"Goods and services",
"Married and unmarried are indeed interfering with participles",
"Buy fruits and then come to the Expo and die at the Expo",
"China's capital is Beijing",
"Welcome the new teacher to come to dinner",
"The virgin officer of the industry and information technology must personally explain the installation of technical devices such as 24 switches through the subordinate departments every month",
"With the rise of web games, the current web games are prosperous and rely on archives. The design for logical judgment is reduced, but this one cannot be completely ignored.",
for (String sentence : testCase)
List<Term> termList = HanLP.segment(sentence);
[Products/n, and/c, services/vn]
[Married/v, of/uj, and/c, not yet/d, married/v, of/uj, indeed/ad, at/p, interference/v, participle/n, ah/y]
[Buy/v, fruit/n, then/c, come/v, Expo/j, finally/f, go/v, Expo/j]
[China/ns, of/uj, capital/n, yes/v, Beijing/ns]
[Welcome/v, new/a, teacher/n, before death/t, come/v, dinner/v]
[Industry and Information Office/n, female/b, secretary/n, monthly/r, passing/p, subordinate/v, department/n, all/nr, personally/d,
Explain/v, 24/m, port/q, switch/n, etc/u, technical/n, device/n, of/uj, installation/v, work/vn]
[With/p, page/q, youxing/n, from/v, to/v, now/t, of/uj, page tour/nz, flourishing/an,,/w,
Depend on/v, archive/vn, proceed/v, logic/n, judge/v, of/uj, design/vn, reduce/v, up/ul,,/w,
But/c, this piece of/r, also/d, cannot/v, completely/ad, ignore/v, drop/v,./w]
代码逻辑 吸收能力 技术学习能力 综合素质