νμΌ νμ File Type νμΈ λΌμ΄λΈλ¬λ¦¬ λΉκ΅ :: Apache Tika, JMimeMagic, SimpleMagic
νλ‘κ·Έλλ°μ νλ€λ³΄λ©΄ file μ κ΄λ ¨λ μμ μ ν λκ° μλ€.
μ΄ λ νμΌμ΄ μ΄λ€ νμ μΈμ§ νμΈν΄μ£Όλ λΌμ΄λΈλ¬λ¦¬λ€μ΄ λͺ κ°μ§ μ‘΄μ¬νλλ°, ν λ² μ§μ λΉκ΅ν΄λ³΄μλ€.
1. μ€ν νκ²½:: Intellij, gradle, Kotlin, project SDK 15.0.2
μ’ λ₯μ λ°©μμ λ€μκ³Ό κ°λ€.
1) Apache Tika (tika.apache.org/)
λ°©μ:: FIle MetaData μ νμΌ λ΄μ©μ νμ±ν΄μ νμΈ
νΉμ§:: μ΄μ μλ μμ‘΄μ±μ΄ λ§μμ λΆνΈνμ§λ§, κΎΈμ€ν μ±λ₯ κ°μ μΌλ‘ νλμ dependency λ§ μΆκ°ν΄μ μ¬μ©ν μ μλ€.
μ¬μ©ν dependency : org.apache.tika:tika-parsers:1.18
μ μ μ¬ν:: κ·Έλ₯ dependency μ μΆκ°νλ€κ°λ κΈ°μ‘΄μ dependency μ μΆ©λμ μΌμΌν¬ νλ₯ μ΄ λμΌλ, λ²μ μ κΌ μ νμΈν κ² !!
(λμ κ²½μ° lombck μ΄ 1.18.16 μ΄μλλ°, Apache Tika λ₯Ό μ΄κ±°λ λ§μΆ°μ£ΌλκΉ Conflict κ° ν΄κ²°λλ€.. μ°Έκ³ )
2) JMimeMagic (github.com/arimus/jmimemagic)
λ°©μ:: νμΌ νμ₯μ & ν€λ μ λ³΄λ‘ νμΈ
μ¬μ©ν dependency : net.sf.jmimemagic:jmimemagic:0.1.5
3) SimpleMagic (github.com/j256/simplemagic)
λ°©μ:: νμΌ νμ₯μ & ν€λ μ λ³΄λ‘ νμΈ
μ¬μ©ν dependency : com.j256.simplemagic:simplemagic:1.16
2. μ€ν μ€λΉ
1) gradle μ dependencies μΆκ°
(μμ μ build μ λ§κ² μμ±ν΄μ£Όλ©΄ λλ€.)
implementation("org.apache.tika:tika-parsers:1.18")
implementation("net.sf.jmimemagic:jmimemagic:0.1.5")
implementation("com.j256.simplemagic:simplemagic:1.16")
2) ν μ€νΈ μ½λ μμ±
(ν μ€νΈν νμΌμ λͺ¨μλ ν΄λ κ²½λ‘ : E:\\tmp)
2-1) Apache Tike ν μ€νΈ μ½λ
fun fileType1(){
//Apache Tika
val rootPath = "E:\\tmp"
val file = File(rootPath)
if(!file.exists()) throw IllegalArgumentException("no ${file.absolutePath}")
var tika = Tika()
file.walk().forEach {
println(it.absolutePath)
println(tika.detect(it)+"\n")
}
}
2-2 JMimeMagic ν μ€νΈ μ½λ
fun fileType2 () {
//JMimeMagic
val rootPath = "E:\\tmp"
val file = File(rootPath)
if(!file.exists()) throw IllegalArgumentException("no ${file.absolutePath}")
file.walk().forEach {
println(it.absolutePath)
val match : MagicMatch? = Magic.getMagicMatch(it, true, false)
if(match == null) {
println("file match x\n")
}else {
println("extension: ${match.extension} / mimeType: ${match.mimeType}\n")
}
}
}
2-3 SimpleMagic ν μ€νΈ μ½λ
fun fileType3 () {
//SimpleMagic
var util : ContentInfoUtil = ContentInfoUtil()
val rootPath = "E:\\tmp"
val file = File(rootPath)
if(!file.exists()) throw IllegalArgumentException("no ${file.absolutePath}")
file.walk().forEach {
println(it.absolutePath)
val info : ContentInfo? = util.findMatch(it)
if(info == null) println("file match x \n")
else {
println("contentType: ${info.contentType} / mimeType: ${info.mimeType} \n")
}
}
}
3. κ²°κ³Ό
3-1 Apache Tika
3-2 JMimeMagic
3-3 SimpleMagic
4. κ²°κ³Ό λΆμ
(μ΄λκΉμ§λ μμ μμ±ν ν μ€νΈ μ½λμ μμ νμΌ, λΌμ΄λΈλ¬λ¦¬ λ²μ μ λν κ²°κ³Όμ΄λ―λ‘, λ€λ₯Έ νμΌμ΄λ μ½λ, λ²μ μ λ°κΎΈλ©΄ λ¬λΌμ§ μ μμμ μλ €λ립λλ€.)
(νλμ Bold : [λ΄κΈ°μ€] λ§κ² νλ³ν κ²)
FileType / λΌμ΄λΈλ¬λ¦¬ μ’ λ₯ | Apache Tika | JMimeMagic | SimpleMagic |
.sin | text/plain | text/plain | x |
.html | text/html | text/html | text/html |
.png | image/png | image/png | image/png |
.cpp | text/x-csrc | text/plain | x |
.cpp (λ΄μ©μ empty μν, νμ₯μλ§ λ³κ²½) |
text/x-c++src | text/plain | x |
application/pdf | application/pdf | application/pdf | |
.css | text/css | text/plain | x |
.js | application/javascript | text/plain | x |
.pptx | [warning] application/vnd.openxmlformates-officedocument.presentationml.presentation | application/vnd.openxmlformats-officedocument.presentationml.presentation | application/vnd.openxmlformats-officedocument.presentationml.presentation |
.txt | text/plain | text/plain | x |
.docx | application/vnd.openxmlformats-officedocument.wordprocessingml.document | application/vnd.openxmlformats-officedocument.wordprocessingml.document | application/vnd.openxmlformats-officedocument.wordprocessingml.document |
.hwp | application/x-hwp-v5 | application/msword | null |
λλΆλΆμ νμΌμμ Apache Tika κ° μ νλκ° λμ κ²μ λ³Ό μ μλ€.
κ·Έλ°λ° png, ppt, docx μ λμ νμΌλ§ μ¬μ©νλ€ μΆμΌλ©΄ κ΅³μ΄ Apache Tika λ₯Ό μ¬μ©νμ§ μμλ 무방ν κ²μ΄λ€.
μ¦ μμ μ νμμ λ°λΌ μ·¨μ¬μ ννκΈ°~