今天一段时间一直在忙工作上的事,并没有系统地学习研究某一个具体的问题,但回顾这一个月的工作,发现还是有一些经验可以记录一下的。但这些经验没法系统地整理起来,因此只能算是开发中的杂项了。

杂项一:httpclient典型用法

  1. 基础用法
    HttpClient httpClient = HttpClientBuilder.create().build();
    HttpPost postMethod = null;
    try {
        postMethod = new HttpPost(reqUrl);
        postMethod.setConfig(RequestConfig.custom().setConnectTimeout(2000).setSocketTimeout(5000).build());
        Map<String, Object> params = new HashMap();
        params.put("param1", param1);
        params.put("param2", param2);
        postMethod.setEntity(new StringEntity(JSON.json(params), ContentType.APPLICATION_JSON));
        HttpResponse response = httpClient.execute(postMethod);
        if (response.getStatusLine().getStatusCode() == 200) {
            HttpEntity resEntity = response.getEntity();
            JSONObject parsedJsonObj = (JSONObject) JSON.parse(EntityUtils.toString(resEntity, "UTF-8"));
    
            //process parsedJsonObj
    
            if (resEntity != null) {
                try {
                    EntityUtils.consume(resEntity);
                } catch (Exception ignore) {
                }
            }
        }
    } catch (Exception e) {
        logger.error("请求失败", e);
    } finally {
         if (postMethod != null) {
             postMethod.releaseConnection();
         }
    }
    
    上面这段代码还是太麻烦了,实际编码中可以将上述代码封装成函数,只需要传入reqUrl,params, JsonResponseProcessHandler就可以了。
  2. 拼接请求url
    List<NameValuePair> params = new ArrayList<NameValuePair>();
    params.add(new BasicNameValuePair("param1", param1));
    params.add(new BasicNameValuePair("param2", param2));
    URIBuilder uriBuilder = new URIBuilder("http://exmaple.com").setPath("/req_path").addParameters(params);
    HttpPost postMethod = new HttpPost(uriBuilder.build().toString());
    
  3. 流式续传文件
    RandomAccessFile raf = new RandomAccessFile(file, "r");
    InputStream fileIn = Channels.newInputStream(raf.getChannel().position(offset));
    InputStreamEntity reqEntity = new InputStreamEntity(fileIn, fileSize - offset, ContentType
            .APPLICATION_OCTET_STREAM);
    postMethod.setEntity(reqEntity);
    HttpResponse response = httpClient.execute(postMethod);
    
  4. 定制httpclient的连接管理器
    PoolingHttpClientConnectionManager connectionManager = new PoolingHttpClientConnectionManager();
    // 每个主机最大的连接数(如果会大量向同一主机发送大量http请求,需加大此值)
    connectionManager.setDefaultMaxPerRoute(10);
    // 总共最大的连接数
    connectionManager.setMaxTotal(100);
    httpClient = HttpClientBuilder.create().setConnectionManager(connectionManager).build();
    

杂项二:jdk6升级jdk7改造

  1. jpeg编码代码改造

    jdk6下的代码

    JPEGImageEncoder encoder = JPEGCodec.createJPEGEncoder(out);
    JPEGEncodeParam param = encoder.getDefaultJPEGEncodeParam(bufferedImage);
    param.setQuality(quality, true);
    encoder.setJPEGEncodeParam(param);
    encoder.encode(bufferedImage);
    

    jdk7下的代码

    ImageWriter imageWriter = (ImageWriter)ImageIO.getImageWritersBySuffix("jpeg").next();
    ImageOutputStream ios = ImageIO.createImageOutputStream(out);
    imageWriter.setOutput(ios);
    JPEGImageWriteParam jpegParams  =  (JPEGImageWriteParam) imageWriter.getDefaultWriteParam();
    jpegParams.setCompressionMode(JPEGImageWriteParam.MODE_EXPLICIT);
    jpegParams.setCompressionQuality(quality);
    IIOMetadata imageMetaData = imageWriter.getDefaultImageMetadata(new ImageTypeSpecifier(bufferedImage), null);
    imageWriter.write(imageMetaData, new IIOImage(bufferedImage, null, null), null);
    
  2. 自定义DataSource代码改造

    jdk6下的代码

    public class CustomDataSource extends AbstractRoutingDataSource {
        @Override
        public Object determineCurrentLookupKey() {
            ...
        }
    }
    

    jdk7下的代码

    public class CustomDataSource extends AbstractRoutingDataSource {
        @Override
        public Object determineCurrentLookupKey() {
            ...
        }
        @Override
        public Logger getParentLogger() throws SQLFeatureNotSupportedException {
            throw new java.sql.SQLFeatureNotSupportedException("getParentLogger not supported");
        }
    }
    
  3. pom升级

    pom中加入代码编译级别的配置

    <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-compiler-plugin</artifactId>
        <version>2.3.2</version>
        <configuration>
          <source>1.7</source>
          <target>1.7</target>
        </configuration>
    </plugin>
    

杂项三:简易的python程序分发

工作中使用python写了一部分与NLP相关的代码,但主程序是部署在Tomcat里的java程序,于是需要想办法分发python程序,同时完成java程序与python程序的交互。

  1. 自带python程序的依赖库 python程序依赖于一些第三方python库,但很难让运维提前使用pip安装第三方python库,研究了下,可以采用以下简易方法。
    1. 在目录下新建一个libs目录,将jieba, snownlp等第三方库放到libs目录下。
    2. 修改python程序入口,在最开始加入以下代码。
      import sys
      import os
      sys.path.append(os.path.join(os.path.dirname(os.path.realpath(__file__)), 'libs'))
      
  2. python程序作为简单的http伺服 为了方便java与python交互,将python程序包装为http伺服,以供java程序交互,这里没有用任何其它第三方http框架。
    import threading
    from BaseHTTPServer import BaseHTTPRequestHandler, HTTPServer
    import urlparse
    
    server = None
    
    def stop_server():
        global server
        if server is not None:
            server.shutdown()
    
    class CustomRequestHandler(BaseHTTPRequestHandler):
    
        def do_GET(self):
            # GET请求访问http://127.0.0.1:8333/?op=stop即可停止http伺服器
            params = urlparse.parse_qs(urlparse.urlparse(self.path).query)
            if params.has_key('op') and params['op'][0] == 'stop':
                thread = threading.Thread(None, stop_server)
                thread.start()
            else:
                self.send_response(200)
                self.send_header('Content-type', 'text/html')
                self.end_headers()
                self.wfile.write("<html><body><h1>It Works!</h1></body></html>")
    
        def do_POST(self):
            content_length = int(self.headers['Content-Length'])
            post_data = self.rfile.read(content_length)
            # process post data
            self.send_response(200)
            self.send_header('Content-type', 'text/plain')
            self.end_headers()
            self.wfile.write('\n'.join(respLines))
    
        def log_message(self, format, *args):
            pass
    
    def run(server_class=HTTPServer, handler_class=CustomRequestHandler, port=8333):
        global httpd
        server_address = ('', port)
        server = server_class(server_address, handler_class)
        server.serve_forever()
    
    if __name__ == '__main__':
        from sys import argv
    
        if len(argv) == 2:
            run(port=int(argv[1]))
        else:
            run()
    
  3. java程序里启动与停止python伺服
    private void startPythonService() {
        CommandUtil.executeCommand(new String[]{"python", PYTHON_SCRIPT_PATH, String.valueOf(SERVER_PORT)});
    }
    
    private void stopPythonService() {
        HttpGet getMethod = null;
        try {
            getMethod = new HttpGet("http://127.0.0.1:" + SERVER_PORT + "/?op=stop");
            HttpResponse response = httpClient.execute(getMethod);
        } catch (Exception e) {
            logger.debug("停止python服务的状态失败",e);
        } finally {
            if(getMethod != null) {
                getMethod.releaseConnection();
            }
        }
    }
    

杂项四:统计一组数据中的出现次数最多的topN数据

Multiset<String> nameCounter = HashMultiset.create();
for(String name : names) {
    nameCounter.add(name);
}
Iterable<String> top5Names = Iterables.limit(Multisets.copyHighestCountFirst(nameCounter).asList(), 5);

杂项五:python操作excel文件

  1. 从文件加载excel文件
    # 加载xlsx文件
    workbook = openpyxl.load_workbook(xlsx_file_path)
    # 获取活跃的sheet
    sheet = workbook.active
    # 获取所有的sheet名称
    sheet_names = sheet.get_sheet_names()
    # 获取最二个sheet
    another_sheet = workbook[sheet_names[1]]
    # 修改sheet的名称
    another_sheet.title = 'AnotherSheet'
    # 获取最大的行数
    row_count = sheet.max_row
    # 获取最大的列数
    column_count = sheet.max_column
    
  2. 遍历数据
    for i in range(1,101):
        for j in range(1,101):
            print(ws.cell(row=i, column=j).value)
    
    for row in ws.iter_rows(min_row=1, max_col=3, max_row=2):
        for cell in row:
            print(cell.value)
    
  3. 访问单元格的数据
    #设置单元格数据
    sheet.cell(row=2, column=3, value='xdfdf')
    sheet['C2'].value = 'xdfdf'
    #获取单元格数据
    cell_value = sheet.cell(row=2, column=3).value
    cell_value = sheet['C2'].value
    
  4. 写入文件
    workbook.save(xlsx_save_path)
    
  5. 修改单元格的字段
    cell.font = openpyxl.styles.Font(bold = True)
    

还有一些更高级的用法参见https://openpyxl.readthedocs.io/en/default/

杂项六:python里操作mysql数据库

写了一个工具操作mysql数据库的工具方法如下:

def sql_query_generator(sql):
    try:
        conn = pymysql.connect(host=DB_IP, user=DB_USER, password=DB_PASSWD, \
                               db=DB_NAME, charset='utf8', cursorclass=pymysql.cursors.DictCursor)
        with conn.cursor() as cursor:
            cursor.execute(sql)
            for row in cursor:
                yield row
    finally:
        conn.close()

使用起来也比较简单:

query_generator = sql_query_generator('select * from user;')

for user in query_generator:
    print(user)

杂项七:python里计算两个字符串的相似度

import difflib
print(difflib.SequenceMatcher(None, 'hello world', 'hello').ratio())
# 也可以用Levenshtein
import Levenshtein
print(Levenshtein.ratio('hello world', 'hello'))