博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
<转>C#读取doc,pdf,ppt文件
阅读量:4929 次
发布时间:2019-06-11

本文共 3427 字,大约阅读时间需要 11 分钟。

doc  pdf ppt与 txt之间的转换 :

组件的作用一般是将文件读出成字符格式,并不是单纯的转换文件名后缀,所以需要将读出的东西写入txt文件 。

 

添加office引用

.net中对office中的word及ppt进行编程时,确保安装office时已经安装了word,ppt可编程组件(自定义安装时可查看)或者安装“Microsoft Office 2003 Primary Interop Assemblies”

安装后,在编程页面添加引用:

添加引用-com—microsoft powerpoint object 11.0 libaray/word 11.0 object library;

还得添加office组件

using Microsoft.Office.Interop.Word;

using Microsoft.Office.Interop.PowerPoint;

 

using org.pdfbox.pdmodel;                    

using org.pdfbox.util;

 

using Microsoft.Office.Interop.Word;

using Microsoft.Office.Interop.PowerPoint;

public void pdf2txt(FileInfo file,FileInfo txtfile)

    {

        PDDocument doc = PDDocument.load(file.FullName);

        PDFTextStripper pdfStripper = new PDFTextStripper();

        string text = pdfStripper.getText(doc);

            StreamWriter swPdfChange = new StreamWriter(txtfile.FullName, false, Encoding.GetEncoding("gb2312"));

        swPdfChange.Write(text);

        swPdfChange.Close();

    }

 

对于doc文件中的表格,读出的结果是去除掉了网格线,内容按行读取。

    public void word2text(FileInfo file,FileInfo txtfile)

    {

 

        object readOnly = true;

        object missing = System.Reflection.Missing.Value;

        object fileName = file.FullName;

        Microsoft.Office.Interop.Word.ApplicationClass wordapp = new Microsoft.Office.Interop.Word.ApplicationClass();

        Document doc = wordapp.Documents.Open(ref fileName,

    ref missing, ref readOnly, ref missing, ref missing, ref missing,

    ref missing, ref missing, ref missing, ref missing, ref missing,

    ref missing, ref missing, ref missing, ref missing, ref missing);

        string text = doc.Content.Text;

        doc.Close(ref missing, ref missing, ref missing);

        wordapp.Quit(ref missing, ref missing, ref missing);

        StreamWriter swWordChange = new StreamWriter(txtfile.FullName, false, Encoding.GetEncoding("gb2312"));

        swWordChange.Write(text);

        swWordChange.Close();

 

    }

 

    public void ppt2txt(FileInfo file, FileInfo txtfile)

    {

         Microsoft.Office.Interop.PowerPoint.Application pa = new Microsoft.Office.Interop.PowerPoint.ApplicationClass();

        Microsoft.Office.Interop.PowerPoint.Presentation pp = pa.Presentations.Open(file.FullName,

                        Microsoft.Office.Core.MsoTriState.msoTrue,

                        Microsoft.Office.Core.MsoTriState.msoFalse,

                        Microsoft.Office.Core.MsoTriState.msoFalse);

        string pps = "";

        StreamWriter swPPtChange = new StreamWriter(txtfile.FullName, false, Encoding.GetEncoding("gb2312"));

 

        foreach (Microsoft.Office.Interop.PowerPoint.Slide slide in pp.Slides)

        {

            foreach (Microsoft.Office.Interop.PowerPoint.Shape shape in slide.Shapes)

 

                pps += shape.TextFrame.TextRange.Text.ToString();

 

        }

        swPPtChange.Write(pps);

        swPPtChange.Close();

 

 

    }

 

读取不同类型的文件

    public StreamReader text2reader(FileInfo file)

    {

        StreamReader st = null;

        switch (file.Extension.ToLower())

        {

            case ".txt":

                st = new StreamReader(file.FullName, Encoding.GetEncoding("gb2312"));

                break;

            case ".doc":

                FileInfo wordfile = new FileInfo(@"E:\my programs\200807program\FileSearch\App_Data\word2txt.txt");//不能使用相对路径,想办法改进

                word2text(file, wordfile);

                st = new StreamReader(wordfile.FullName, Encoding.GetEncoding("gb2312"));

                break;

            case ".pdf":

                FileInfo pdffile = new FileInfo(@"E:\my programs\200807program\FileSearch\App_Data\pdf2txt.txt");

                pdf2txt(file, pdffile);

                st = new StreamReader(pdffile.FullName, Encoding.GetEncoding("gb2312"));

                break;

            case".ppt":

                FileInfo pptfile = new FileInfo(@"E:\my programs\200807program\FileSearch\App_Data\ppt2txt.txt");

                ppt2txt(file,pptfile);

                st = new StreamReader(pptfile.FullName,Encoding.GetEncoding("gb2312"));

                break;

        }

        return st;

    }

转载于:https://www.cnblogs.com/qingshan/archive/2012/08/16/2642626.html

你可能感兴趣的文章
为你的AliOS Things应用增加自定义cli命令
查看>>
MongoDB 创建基础索引、组合索引、唯一索引以及优化
查看>>
百度PaddlePaddle常规赛NLP赛道火热开启
查看>>
稳了!这才是cookie,session与token的真正区别
查看>>
OSChina 周二乱弹 —— 假期余额已不足!
查看>>
前端那些事之React篇--helloword
查看>>
ios的google解析XML框架GDataXML的配置及使用
查看>>
netty-当一个客户端连接到来的时候发生了什么
查看>>
PHP_5.3.20 源码编译安装PHP-FPM
查看>>
在51CTO三年年+了,你也来晒晒
查看>>
js控制图片等比例缩放
查看>>
Java高级开发工程师面试考纲
查看>>
FreeMarker表达式
查看>>
Debian9.2 下使用vnstat查看服务器带宽流量统计
查看>>
NGINX + PHP-FPM 502
查看>>
mysql数据备份与恢复
查看>>
Openstack API常用命令
查看>>
OpenSSL漏洞凶猛来袭 慧眼恶意代码监测应对有方
查看>>
C语言 喝汽水问题
查看>>
LINUX中搭建DNS服务器,实现正向、反向以及访问不同DNS解析
查看>>