-
νμΌ νμ File Type νμΈ λΌμ΄λΈλ¬λ¦¬ λΉκ΅ :: Apache Tika, JMimeMagic, SimpleMagicPROGRAMMING/κΈ°ν 2021. 2. 1. 22:46
νλ‘κ·Έλλ°μ νλ€λ³΄λ©΄ file μ κ΄λ ¨λ μμ μ ν λκ° μλ€.
μ΄ λ νμΌμ΄ μ΄λ€ νμ μΈμ§ νμΈν΄μ£Όλ λΌμ΄λΈλ¬λ¦¬λ€μ΄ λͺ κ°μ§ μ‘΄μ¬νλλ°, ν λ² μ§μ λΉκ΅ν΄λ³΄μλ€.
1. μ€ν νκ²½:: Intellij, gradle, Kotlin, project SDK 15.0.2
μ’ λ₯μ λ°©μμ λ€μκ³Ό κ°λ€.
1) Apache Tika (tika.apache.org/)
λ°©μ:: FIle MetaData μ νμΌ λ΄μ©μ νμ±ν΄μ νμΈ
νΉμ§:: μ΄μ μλ μμ‘΄μ±μ΄ λ§μμ λΆνΈνμ§λ§, κΎΈμ€ν μ±λ₯ κ°μ μΌλ‘ νλμ dependency λ§ μΆκ°ν΄μ μ¬μ©ν μ μλ€.
μ¬μ©ν dependency : org.apache.tika:tika-parsers:1.18
μ μ μ¬ν:: κ·Έλ₯ dependency μ μΆκ°νλ€κ°λ κΈ°μ‘΄μ dependency μ μΆ©λμ μΌμΌν¬ νλ₯ μ΄ λμΌλ, λ²μ μ κΌ μ νμΈν κ² !!
(λμ κ²½μ° lombck μ΄ 1.18.16 μ΄μλλ°, Apache Tika λ₯Ό μ΄κ±°λ λ§μΆ°μ£ΌλκΉ Conflict κ° ν΄κ²°λλ€.. μ°Έκ³ )
2) JMimeMagic (github.com/arimus/jmimemagic)
λ°©μ:: νμΌ νμ₯μ & ν€λ μ λ³΄λ‘ νμΈ
μ¬μ©ν dependency : net.sf.jmimemagic:jmimemagic:0.1.5
3) SimpleMagic (github.com/j256/simplemagic)
λ°©μ:: νμΌ νμ₯μ & ν€λ μ λ³΄λ‘ νμΈ
μ¬μ©ν dependency : com.j256.simplemagic:simplemagic:1.16
2. μ€ν μ€λΉ
1) gradle μ dependencies μΆκ°
(μμ μ build μ λ§κ² μμ±ν΄μ£Όλ©΄ λλ€.)
implementation("org.apache.tika:tika-parsers:1.18") implementation("net.sf.jmimemagic:jmimemagic:0.1.5") implementation("com.j256.simplemagic:simplemagic:1.16")
2) ν μ€νΈ μ½λ μμ±
(ν μ€νΈν νμΌμ λͺ¨μλ ν΄λ κ²½λ‘ : E:\\tmp)
2-1) Apache Tike ν μ€νΈ μ½λ
fun fileType1(){ //Apache Tika val rootPath = "E:\\tmp" val file = File(rootPath) if(!file.exists()) throw IllegalArgumentException("no ${file.absolutePath}") var tika = Tika() file.walk().forEach { println(it.absolutePath) println(tika.detect(it)+"\n") } }
2-2 JMimeMagic ν μ€νΈ μ½λ
fun fileType2 () { //JMimeMagic val rootPath = "E:\\tmp" val file = File(rootPath) if(!file.exists()) throw IllegalArgumentException("no ${file.absolutePath}") file.walk().forEach { println(it.absolutePath) val match : MagicMatch? = Magic.getMagicMatch(it, true, false) if(match == null) { println("file match x\n") }else { println("extension: ${match.extension} / mimeType: ${match.mimeType}\n") } } }
2-3 SimpleMagic ν μ€νΈ μ½λ
fun fileType3 () { //SimpleMagic var util : ContentInfoUtil = ContentInfoUtil() val rootPath = "E:\\tmp" val file = File(rootPath) if(!file.exists()) throw IllegalArgumentException("no ${file.absolutePath}") file.walk().forEach { println(it.absolutePath) val info : ContentInfo? = util.findMatch(it) if(info == null) println("file match x \n") else { println("contentType: ${info.contentType} / mimeType: ${info.mimeType} \n") } } }
3. κ²°κ³Ό
3-1 Apache Tika
3-2 JMimeMagic
3-3 SimpleMagic
4. κ²°κ³Ό λΆμ
(μ΄λκΉμ§λ μμ μμ±ν ν μ€νΈ μ½λμ μμ νμΌ, λΌμ΄λΈλ¬λ¦¬ λ²μ μ λν κ²°κ³Όμ΄λ―λ‘, λ€λ₯Έ νμΌμ΄λ μ½λ, λ²μ μ λ°κΎΈλ©΄ λ¬λΌμ§ μ μμμ μλ €λ립λλ€.)
(νλμ Bold : [λ΄κΈ°μ€] λ§κ² νλ³ν κ²)
FileType / λΌμ΄λΈλ¬λ¦¬ μ’ λ₯ Apache Tika JMimeMagic SimpleMagic .sin text/plain text/plain x .html text/html text/html text/html .png image/png image/png image/png .cpp text/x-csrc text/plain x .cpp
(λ΄μ©μ empty μν, νμ₯μλ§ λ³κ²½)text/x-c++src text/plain x .pdf application/pdf application/pdf application/pdf .css text/css text/plain x .js application/javascript text/plain x .pptx [warning] application/vnd.openxmlformates-officedocument.presentationml.presentation application/vnd.openxmlformats-officedocument.presentationml.presentation application/vnd.openxmlformats-officedocument.presentationml.presentation .txt text/plain text/plain x .docx application/vnd.openxmlformats-officedocument.wordprocessingml.document application/vnd.openxmlformats-officedocument.wordprocessingml.document application/vnd.openxmlformats-officedocument.wordprocessingml.document .hwp application/x-hwp-v5 application/msword null λλΆλΆμ νμΌμμ Apache Tika κ° μ νλκ° λμ κ²μ λ³Ό μ μλ€.
κ·Έλ°λ° png, ppt, docx μ λμ νμΌλ§ μ¬μ©νλ€ μΆμΌλ©΄ κ΅³μ΄ Apache Tika λ₯Ό μ¬μ©νμ§ μμλ 무방ν κ²μ΄λ€.
μ¦ μμ μ νμμ λ°λΌ μ·¨μ¬μ ννκΈ°~
'PROGRAMMING > κΈ°ν' μΉ΄ν κ³ λ¦¬μ λ€λ₯Έ κΈ