Java获取TXT文本和Word文件的内容并显示在页面
?
注意2003版本的word和2007以上的word需要用不同的jar包来获取!
1.TXT文本:
class="java" name="code">import java.io.BufferedReader;
import java.io.FileReader;
StringBuffer texts =new StringBuffer();
BufferedReader br = new BufferedReader(new FileReader(file)); //
String line = null;
while ((line = br.readLine()) != null) {
texts.append(line);
}
br.close();
注意:按照上面的方式,会出现中文乱码问题!
?
解决方式:加上编码转换
StringBuffer texts =new StringBuffer();
InputStreamReader isr = new InputStreamReader(new FileInputStream(file), "UTF-8");//加上编码转换
BufferedReader read = new BufferedReader(isr);
String line = null;
while ((line = br.readLine()) != null) {
texts.append(line);
}
br.close();
?
?
2.Word2003——doc格式:
import java.io.FileInputStream;
import org.apache.poi.hwpf.extractor.WordExtractor;
try {
FileInputStream inputStream = new FileInputStream(file);
WordExtractor extractor = new WordExtractor(inputStream);
text = extractor.getText();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (Exception e) {
e.printStackTrace();
}
?
或者
import java.io.FileInputStream;
import org.textmining.text.extraction.WordExtractor;//引入包不同
try {
FileInputStream inputStream = new FileInputStream(file);
WordExtractor extractor = new WordExtractor();//此处无参数
text = extractor.getText(inputStream);//此处有参数
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (Exception e) {
e.printStackTrace();
}
注意注释的不同之处!
?
3.Word2007及以上版本——docx格式:
使用到的 jar 包
* poi-3.9-20121203.jar
* poi-ooxml-3.9-20121203.jar
* poi-ooxml-schemas-3.9-20121203.jar
* poi-scratchpad-3.9-20121203.jar
* xmlbeans-2.3.0.jar
* dom4j-1.6.1.jar
import org.apache.poi.POIXMLDocument; import org.apache.poi.POIXMLTextExtractor; import org.apache.poi.openxml4j.opc.OPCPackage; import org.apache.poi.xwpf.extractor.XWPFWordExtractor; try { OPCPackage opcPackage = POIXMLDocument.openPackage(filePath); POIXMLTextExtractor extractor = new XWPFWordExtractor(opcPackage); text = extractor.getText(); } catch (IOException e) { e.printStackTrace(); } catch (XmlException e) { e.printStackTrace(); } catch (OpenXML4JException e) { e.printStackTrace(); }
?
?
4.实例分析:
long id = Long.valueOf(request.getParameter("id"));
PolicyDao policyDao = new PolicyDao();
Policy policy = policyDao.getPolicy(id);
//读取文件中的内容
StringBuffer fileContent = new StringBuffer();
String fileName = policy.getFilePath();
String uploadPath = Configuration.getConfig().getString("policyFilesPath");
File file = new File(uploadPath+fileName);
if(file.exists()){
String suffix = file.getName().substring(file.getName().lastIndexOf(".")+1);
//Word2003
if (suffix.equals("doc")) {
FileInputStream fis = new FileInputStream(file);
WordExtractor wordExtractor = new WordExtractor(fis);
String text = wordExtractor.getText();
fileContent.append(text);
}
//Word2007
else if (suffix.equals("docx")) {
OPCPackage opcPackage = POIXMLDocument.openPackage(uploadPath+fileName);
POIXMLTextExtractor extractor = new XWPFWordExtractor(opcPackage);
String text = extractor.getText();
fileContent.append(text);
}
//TXT
else if (suffix.equals("txt")) {
BufferedReader bufferReader = new BufferedReader(new InputStreamReader(new FileInputStream(file),"utf-8"));
//每从BufferedReader对象中读取一行字符。
String line = null;
while((line=bufferReader.readLine()) !=null){
fileContent.append(line);
}
bufferReader.close();
}
}else{
System.out.println("文件不存在!");
}
//输出
request.setAttribute("content", fileContent);
request.setAttribute("name", policy.getTitle());
request.setAttribute("id", policy.getId());
request.getRequestDispatcher("/frontShow/document-info.jsp").forward(request, response);
return;
?
?
?
?