Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

FWIW, BSD grep has significantly closed the gap since then, often by replication GNU approach in some ways.

Also BSD grep has other advantages, primarily it's not GNU grep.



This is curious, as I just did a test with grep and ggrep. The latter is almost 3 times faster for a very common use case I have.


Presumably on macOS? Basically all BSD- or GNU-derived tools that ship on macOS are ancient, ancient versions that deserve to be burnt away and should not be considered indicative of what the current state-of-the-art is capable of.


grep -V ? ldd grep ? Use case sample ?


grep (BSD grep) 2.5.1-FreeBSD

The other commenter in this thread pointed out that this is a very old version and the newer bsd version is better.

    real 0m2.044s
    user 0m1.932s
    sys 0m0.085s
    wgl@pondera:~$ time grep LiteonTe *.text | wc -l
       11020
    
    real 0m1.939s
    user 0m1.905s
    sys 0m0.038s
    wgl@pondera:~$ time ggrep LiteonTe *.text | wc -l
       11020
    
    real 0m0.130s
    user 0m0.087s
    sys 0m0.037s
    wgl@pondera:~$ time ggrep LiteonTe *.text | wc -l
       11020
    
    real 0m0.119s
    user 0m0.088s
    sys 0m0.035s
    wgl@pondera:~$ du -h -s *.text
    128M Kismetkismet-kali-pondera-20190325-08-46-27-1.pcapdump.text
First one was done to cache then the number discarded. Thus the 'grep' you see above is the second run over the 128 mb pcap file expanded with tshark.

Dramatic.

I'll stay with the gnu grep and not update the regular ones for now.


It would help if you tested just grep when benchmarking grep. These datapoints tell a much different story.

  # /usr/bin/grep -V
  grep (BSD grep) 2.6.0-FreeBSD
  
  root@m6600:~ # /usr/local/bin/grep -V
  grep (GNU grep) 3.3
  Copyright (C) 2018 Free Software Foundation, Inc.
  License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
  This is free software: you are free to change and redistribute it.
  There is NO WARRANTY, to the extent permitted by law.
  
  Written by Mike Haertel and others; see
  <https://git.sv.gnu.org/cgit/grep.git/tree/AUTHORS>.
  
  root@m6600:~ # /usr/bin/time /usr/bin/grep X-User-Agent packetdump.pcap -c
  60
          0.54 real         0.45 user         0.07 sys
  root@m6600:~ # /usr/bin/time /usr/bin/grep X-User-Agent packetdump.pcap -c
  60
          0.54 real         0.44 user         0.08 sys
  root@m6600:~ # /usr/bin/time /usr/bin/grep X-User-Agent packetdump.pcap -c
  60
        0.54 real         0.41 user         0.11 sys
  root@m6600:~ # /usr/bin/time /usr/local/bin/grep X-User-Agent packetdump.pcap -c
  60
          0.58 real         0.49 user         0.08 sys
  root@m6600:~ # /usr/bin/time /usr/local/bin/grep X-User-Agent packetdump.pcap -c
  60
          0.60 real         0.48 user         0.11 sys
  root@m6600:~ # /usr/bin/time /usr/local/bin/grep X-User-Agent packetdump.pcap -c
  60
          0.59 real         0.50 user         0.08 sys
  root@m6600:~ # du -h -s packetdump.pcap
  225M packetdump.pcap


That is a very good point. Taking this better approach, here is what I get on my (not updated grep) system:

    wgl:$ /usr/bin/grep --version
    /usr/bin/grep --version
    grep (BSD grep) 2.5.1-FreeBSD
    
    wgl:$ /usr/local/bin/ggrep --version
    /usr/local/bin/ggrep --version
    ggrep (GNU grep) 3.3
    Packaged by Homebrew
    Copyright (C) 2018 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law.
    
    Written by Mike Haertel and others; see
    <https://git.sv.gnu.org/cgit/grep.git/tree/AUTHORS>.
    
    wgl:$ /usr/bin/time /usr/local/bin/ggrep LiteonTe really-big.text | wc -l
    /usr/bin/time /usr/local/bin/ggrep LiteonTe really-big.text | wc -l
            2.30 real         1.04 user         0.67 sys
        1228
    wgl:$ /usr/bin/time /usr/bin/grep LiteonTe really-big.text | wc -l
    /usr/bin/time /usr/bin/grep LiteonTe really-big.text | wc -l
            5.65 real         5.30 user         0.33 sys
        1228
    wgl:$ /usr/bin/time /usr/local/bin/ggrep LiteonTe really-big.text >/dev/null
    /usr/bin/time /usr/local/bin/ggrep LiteonTe really-big.text >/dev/null
            0.05 real         0.03 user         0.01 sys
    wgl:$ /usr/bin/time /usr/bin/grep LiteonTe really-big.text >/dev/null
    /usr/bin/time /usr/bin/grep LiteonTe really-big.text >/dev/null
            6.50 real         5.71 user         0.58 sys
    wgl:$ /usr/bin/time /usr/local/bin/ggrep LiteonTe really-big.text -c
    /usr/bin/time /usr/local/bin/ggrep LiteonTe really-big.text -c
    1228
            2.33 real         1.05 user         0.69 sys
    wgl:$ /usr/bin/time /usr/bin/grep LiteonTe really-big.text -c
    /usr/bin/time /usr/bin/grep LiteonTe really-big.text -c
    1228
            5.37 real         5.05 user         0.31 sys
The wc -l is clearly polluting the result. However, I suspect that the >/dev/null is as well. But in the worst case, I see a halving of time over the old grep (edited), which correlates with my most common use of grep in looking through source files.


Compiler is also going to make an impact which for me is consistent across both grep binaries.

FreeBSD clang version 6.0.1 as well as -O2

I suspect there are still edge cases where BSD grep is quite a bit slower or not compatible with GNU grep. However with a closer apples to apples comparison there isn't much difference anymore for my usage. Which is a lot of grep use but that is pretty vanilla.

There may also be other OS differences in our comparison. My tests where run against a fairly recent FreeBSD 12-STABLE.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: